Understanding the Dataset & DataLoader in PyTorch

Update on 9-Apr-2020

  1. I have created a very simple example on Github. Please take a look at the link.
  2. I had an opportunity to present regarding Faster R-CNN. The slides can be found here. Note, I adapted figures from multiple sources (inc. textbooks, blog posts, etc); the original material can be found from links on the slides.


PyTorch has multiple well known Computer Vision models built-in, which can readily be used for transfer learning as well as training your own models. There are many examples and official tutorials, e.g.

After some surveys, I thought

“Yes! This tutorial explains it very well, and the…

What is EMI?

Expose my ignorance. After my PhD, no one corrects or challenges my writing, coding, ML theory, the value of life, etc. I realised I need to take some notes on what I learnt every week.

EMI-10: Python. Generator

As written in EMI#1, I am still learning the use of generator and iterator. When I open .txt or .csv file, I normally use pandas for no reasons; however, if the file is too large to open, need to follow a different approach. Assume we have a text file (list_of_int.txt) below:

# list_of_int.txt

Example 10.1: Create a simple function

def int_gen_func(filename): """A simple function to read…

Photo by Stephen Dawson on Unsplash

In a business context, we are often interested in creating dashboards, which enable us to show images, graphs, tables, etc. There are many frameworks to create dashboards (a.k.a. frontend applications). For Python users, Plotly/Dash would be one of the options. Regarding platforms, Google Cloud Platform (GCP) provides a fully managed serverless platform, App Engine, where we can readily deploy a frontend application.

In a previous article, we discussed how to deploy a Flask app on Cloud Run with authentication. With the previous example, we are going to demonstrate two things:

  1. how to deploy a Dash application on AppEngine;
  2. how to…

Miscellaneous notes related to coding, machine learning, data science

Already a month has passed from the last post of EMI. Here are some findings from my daily work.

EMI-5: Python. Jupyter notebook, execute Terminal command

I knew the use of ! command, which enables to execute Linux command on Jupyter notebook, but I have never used before.

echo command on Jupyter Notebook
├── img_1.png
├── img_2.png
├── img_3.png
└── img_4.jpg

EMI-6: Python. Find the longest sub-list from the main-list.

When I was studying NLP, I wanted to find the longest list from the main list (i.e. find the largest number of words in the batch). I made a mistake in the use of max function.

# main list m_l = [[32, 37, 4, 999999999, 43], [30, 156, 78, 3614…

Authorised users only

Recently I had the opportunity to learn how to host a Flask application on Google Cloud Platform (GCP) using Cloud Run and Cloud Endpoints. Though official documentation is provided, it took me some time to understand and implement the various components correctly. In this article, I am going to show you how to deploy a Flask app on Cloud Run with authentication. Let’s deploy a web application on GCP with an authentication process.

Miscellaneous notes related to coding, machine learning, data science


While I was a teenager, I had to do homework. Regarding mathematics and physics related studies, I had to repeat similar exercises over and over — my cram school teacher told us that “It is rensei” (rensei means training, drilling, formalising in Japanese).

Recently, I learnt many things from my work, however, I realised that I have rarely re-visited what I learnt; therefore, I sometimes forget how to solve a problem which I came across before. This series of post (hope it continues) will be memorandums for my work, in the area of data science, coding, and machine learning.

Rensei: drilling, training

EMI-1: Python. Comprehension and generator


Photo by Markus Winkler on Unsplash

As a former electrical engineering student, my ‘go-to’ language has always been Matlab. Matlab is great for numerical analysis (including implementing deep learning models with recent updates); however, Matlab is not free.

During my undergraduate studies, I learnt Python. Python is one of the most popular programming languages, especially in the field of data science; it has many built-in functions and modules to facilitate data analysis.

In fact, my ‘go-to’ language has recently been shifting to Python. I am falling in love with Python. …

Takashi Nakamura, PhD

Data scientist and machine learning engineer. PhD in Signal Processing for Neuroscience. https://www.linkedin.com/in/takashi-nakamura-004875a6/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store