EMI #2

Miscellaneous notes related to coding, machine learning, data science

Already a month has passed from the last post of EMI. Here are some findings from my daily work.

EMI-5: Python. Jupyter notebook, execute Terminal command

I knew the use of ! command, which enables to execute Linux command on Jupyter notebook, but I have never used before.

echo command on Jupyter Notebook
my_dir
├── img_1.png
├── img_2.png
├── img_3.png
└── img_4.jpg

EMI-6: Python. Find the longest sub-list from the main-list.

When I was studying NLP, I wanted to find the longest list from the main list (i.e. find the largest number of words in the batch). I made a mistake in the use of max function.

# main list
m_l = [[32, 37, 4, 999999999, 43],
[30, 156, 78, 3614, 25, 11, 169, 3096, 21],
[1, 2], [0], [], []]
# simple "max" finds the list, which has the max value
max(m_l) # [32, 37, 4, 999999999, 43]
# find the longest sub list from main list
max(m_l, key=lambda s_l: len(s_l))
# [30, 156, 78, 3614, 25, 11, 169, 3096, 21]

If we have a different type of values in the main list (e.g. int), we need to add isinstance() to check the type of object.

max(m_l, key=lambda s_l: len(s_l) if isinstance(s_l, list) else 0))

The page explains a different example using dict

square = {2: 4, -3: 9, -1: 1, -2: 4}# the largest key
max(square) # 2
# the key whose value is the largest
max(square, key = lambda k: square[k]) # -3

Or alternatively, use iteritems() and itemgetter()

import operator
stats = {'a':1000, 'b':3000, 'c': 100}
max(stats.iteritems(), key=operator.itemgetter(1))[0] # "b"

EMI-7: Yaml. Update environment variables and substitute

For Kubernetes clusters, we can configure environment variables. I came across a situation that I would like to apply some .yaml files to running replicas, but the .yaml file can be re-used and flexible for different deployment versions, project name etc. I ended up the idea:

  1. Export environment variables (source command)
  2. With the environment variables, update .yaml file (envsubst command)

my_env_val.txt file is defined as:

export MY_APP_VERSION=Version1.2.3.4
export MY_SCHEME_ONE="To Be Defined"

initial.yaml file is below

info:
version: $MY_APP_VERSION
schemes:
- $MY_SCHEME_ONE

Run the command line

$ source my_env_val.txt
$ envsubst <initial.yaml> after.yaml

The after.yaml file is now updated as

info:
version: Version1.2.3.4
schemes:
- To Be Defined

EMI-8: Python. Iter function for a string.

For NLP application, we feed a single word into some function iteratively to process the target sentence. I guess it’s such a rookie mistake, but I did not know how the iter works for astring and a list.

# Iter for a string
str_iter = iter("a blue sky")
for i in range(3):
print(i, next(str_iter))
# 0 a
# 1
# 2 b
# Iter for list
list_iter = iter(["a blue sky"])
print(next(list_iter))
# a blue sky

What should I have done is tokenise or split the target string into a list and feed use iter function.

EMI-9: Python. Iiteral_eval for JSON string

I think I have used ast.literal_eval more than a dozen time, in order to, convert string format of json. However, I always forget.

import ast

s = '["a", "b", "c"]'

l = ast.literal_eval(s)
print(l)
# ['a', 'b', 'c']

Found an example on NLP course I am taking, to load multiple files.

def convert_json_examples_to_text(filepath):
example_jsons = list(map(ast.literal_eval, open(filepath)))
# Then, Read in the json from the example file

Data scientist and machine learning engineer. PhD in Signal Processing for Neuroscience. https://www.linkedin.com/in/takashi-nakamura-004875a6/