# What is EMI?

** Expose my ignorance**. After my PhD, no one corrects or challenges my writing, coding, ML theory, the value of life, etc. I realised I need to take some notes on what I learnt every week.

# EMI-10: Python. Generator

As written in EMI#1, I am still learning the use of generator and iterator. When I open `.txt`

or `.csv`

file, I normally use `pandas`

for no reasons; however, if the file is too large to open, need to follow a different approach. Assume we have a text file (`list_of_int.txt`

) below:

`# list_of_int.txt`

23

21

9

12

3

## Example 10.1: Create a simple function

def int_gen_func(filename):

"""A simple function to read line by line"""

for line in open(filename):

yield linefilename = "list_of_int.txt" # text filename

int_gen = int_gen_func(filename) # generatori = 0

while True:

try:

next_int = int(next(int_gen))

except StopIteration:

print("STOPPED", i)

break

print(i, next_int)

i += 1# 0 23

# 1 21

# 2 9

# 3 12

# 4 3

# STOPPED 5

## Example 10.2: Use **generator comprehension**

filename = "list_of_int.txt" # text filename# generator comprehension

int_gen_comprehension = (line for line in open(filename))i = 0

while True:

try:

next_int = int(next(int_gen_comprehension))

except StopIteration:

print("STOPPED", i)

break

print(i, next_int)

i += 1# 0 23

# 1 21

# 2 9

# 3 12

# 4 3

# STOPPED 5

I asked the question to my coding sensei, Jacob Unna, and he taught me a few more things. When we call `open`

, the operating system creates a `filehandle`

to that file and doesn't close it until you explicitly tell it to. (So it's best to use `with open() as f:`

to make sure it gets closed, otherwise, the app will just use more memory). As `f`

is a generator already:

## Example 10.3: Use with open() as f:

filename = "list_of_int.txt" # text filename# f is generator

with open(filename) as f:

for i, line in enumerate(f):

print(i, int(line))

# 0 23

# 1 21

# 2 9

# 3 12

# 4 3

# STOPPED 5

I knew the above solution (because it’s available anywhere if we google “how to open a text file in python”) but I have never thought much carefully before.

I wanted to use `StopIteration`

in my code because I learnt. Fundamentally the below codes (Ex10.4 and Ex10.5) are the same:

## Example 10.4:

`x = [1, 2, 3]`

_iter_x = iter(x)

while True:

try:

v = next(_iter_x)

except StopIteration:

break

print(v)

## Example 10.5:

`x = [1, 2, 3]`

for v in x:

print(v)

# EMI-11: Python. Defaultdict

`defaultdict`

enables us to handle missing keys. A very nice article by Real Python summarised the use of `defaultdict`

. I have used it in a simple implementation, such as below:

from collections import defaultdictmy_list_dict = defaultdict(list)for i in range(20):

mod_3 = i % 3

my_list_dict[mod_3].append(i)print(my_list_dict)# defaultdict(list,

# {0: [0, 3, 6, 9, 12, 15, 18],

# 1: [1, 4, 7, 10, 13, 16, 19],

# 2: [2, 5, 8, 11, 14, 17]})

Then, one of my colleagues used in a different way with `lambda`

. The below code finds the maximum value in the list, `all_list`

my_list_dict_max = defaultdict(lambda: {"all_list": [],

"max_list": 0})for i in range(20):

mod_3 = i % 3

my_list_dict_max[mod_3]["all_list"].append(i)

my_list_dict_max[mod_3]["max_list"] = max(my_list_dict_max[mod_3]["all_list"])

# EMI-12: Python. Create a simple graph

I sometimes show and tell how to code quickly during my remote work. I wanted to plot a simple math function like y = 6x log_2(x) + 6x. The series of mistakes I did are below:

# Step1 (Error)x = [i for i in range(0, 150, 0.01)]TypeError: 'float' object cannot be interpreted as an integer# Step2 (Error)x = [i for i in np.arange(0, 150, 0.01)]

y = 6 * x * math.log(x) + 6 * xTypeError: must be real number, not list# Step3 (Fine)x = np.array([i for i in np.arange(0, 150, 0.01)])

y = 6 * x * np.log2(x) + 6 * x

I then googled it and found a simpler solution.

**# Step4 (Googled)**

x = np.linspace(0, 150, 15000)

y = 6 * x * np.log2(x) + 6 * x

Since `np.linspace`

returns `np.array`

object, so three codes give similar results if the number of elements in the array is larger (NOTE: `np.linspace`

gives slightly different):

`arange_list_array = np.array([i for i in np.arange(0, 150, 0.01)])`

arange_array = np.array(np.arange(0, 150, 0.01))

linspace = np.linspace(0, 150, 15000)

For plotting the graph, simply use `matplotlib.pyplot`

`plt.plot(x, y_0, label="6xlog2x+6x")`

# EMI-13: Python. int(x, base=10)

I guess anyone has used the simple `int()`

build-in functions; but we are able to specify the `base`

of input. When I was studying algorithms, the input value was the binary string format.

A line of the input file

# 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 1 1 1 0 0 1 0 0with open(filename) as f:

lines = f.readlines()

for line in lines:

binary_str_line = ''.join(line.split())

base10_num = int(binary_str_line, 2)

print(binary_str_line, base10_num)# 010011101011001111100100 5157860

# EMI-14: Python. zip()

I have used `zip()`

countless time in order to obtain each element of two lists, such as:

list_1 = [1, 2, 3]

list_2 = [4, 5, 6]

for x, y in zip(list_1, list_2):

print(x, y)# 1 4

# 2 5

# 3 6

However, I have never thought what the function actually does. In Python 3.8 documentation, `zip()`

*makes an iterator that aggregates elements from each of the iterables. *I have explored different uses of `zip()`

function at this time.

## Input

`my_dic = {`

1: [],

2: "two",

"3": (),

}

## Example 14.1: zip(dict)

zip_dic = zip(my_dic)

print(type(zip_dic))

for _ in range(3):

print(next(zip_dic))# <class 'zip'>

# (1,)

# (2,)

# ('3',)

## Example 14.2: compare to iter(dict)

iter_dic = iter(my_dic)

print(type(iter_dic))

for _ in range(3):

print(next(iter_dic))# <class 'dict_keyiterator'>

# 1

# 2

# 3

## Example 14.3: tuple(zip()), list(zip()), dict(zip())

tuple(zip(my_dic))

# ((1,), (2,), ('3',))tuple(zip(my_dic, my_dic))

# ((1, 1), (2, 2), ('3', '3'))tuple(zip(my_dic, my_dic, my_dic))

# ((1, 1, 1), (2, 2, 2), ('3', '3', '3'))list(zip(my_dic, my_dic, my_dic, my_dic))

# [(1, 1, 1, 1), (2, 2, 2, 2), ('3', '3', '3', '3')]dict(zip(my_dic, my_dic))

# {1: 1, 2: 2, '3': '3'}