Understanding the Dataset & DataLoader in PyTorch

Update on 9-Apr-2020

  1. I have created a very simple example on Github. Please take a look at the link.
  2. I had an opportunity to present regarding Faster R-CNN. The slides can be found here. Note, I adapted figures from multiple sources (inc. textbooks, blog posts, etc); the original material can be found from links on the slides.

Background

PyTorch has multiple well known Computer Vision models built-in, which can readily be used for transfer learning as well as training your own models. There are many examples and official tutorials, e.g.

After some surveys, I thought

“Yes! This tutorial explains it very well, and the implementation might be straight-forward; I can run some models for my dataset!”


What is EMI?

Expose my ignorance. After my PhD, no one corrects or challenges my writing, coding, ML theory, the value of life, etc. I realised I need to take some notes on what I learnt every week.

EMI-10: Python. Generator

As written in EMI#1, I am still learning the use of generator and iterator. When I open .txt or .csv file, I normally use pandas for no reasons; however, if the file is too large to open, need to follow a different approach. Assume we have a text file (list_of_int.txt) below:

# list_of_int.txt
23
21
9
12
3

Example 10.1: Create a simple function

def int_gen_func(filename):
"""A simple function to read line by line"""
for line in open(filename):
yield…

Image for post
Image for post
Photo by Stephen Dawson on Unsplash

In a business context, we are often interested in creating dashboards, which enable us to show images, graphs, tables, etc. There are many frameworks to create dashboards (a.k.a. frontend applications). For Python users, Plotly/Dash would be one of the options. Regarding platforms, Google Cloud Platform (GCP) provides a fully managed serverless platform, App Engine, where we can readily deploy a frontend application.

In a previous article, we discussed how to deploy a Flask app on Cloud Run with authentication. With the previous example, we are going to demonstrate two things:

  1. how to deploy a Dash application on AppEngine;
  2. how to configure the App Engine instance to interact with a Flask application on Cloud Run with authentication. …


Miscellaneous notes related to coding, machine learning, data science

Already a month has passed from the last post of EMI. Here are some findings from my daily work.

EMI-5: Python. Jupyter notebook, execute Terminal command

I knew the use of ! command, which enables to execute Linux command on Jupyter notebook, but I have never used before.

Image for post
Image for post
echo command on Jupyter Notebook
my_dir
├── img_1.png
├── img_2.png
├── img_3.png
└── img_4.jpg

EMI-6: Python. Find the longest sub-list from the main-list.

When I was studying NLP, I wanted to find the longest list from the main list (i.e. find the largest number of words in the batch). I made a mistake in the use of max function.

# main list
m_l = [[32, 37, 4, 999999999, 43],
[30, 156, 78, 3614, 25, 11, 169, 3096, 21],
[1, 2], [0], []…

Image for post
Image for post
Authorised users only

Recently I had the opportunity to learn how to host a Flask application on Google Cloud Platform (GCP) using Cloud Run and Cloud Endpoints. Though official documentation is provided, it took me some time to understand and implement the various components correctly. In this article, I am going to show you how to deploy a Flask app on Cloud Run with authentication. Let’s deploy a web application on GCP with an authentication process.


Miscellaneous notes related to coding, machine learning, data science

Introduction

While I was a teenager, I had to do homework. Regarding mathematics and physics related studies, I had to repeat similar exercises over and over — my cram school teacher told us that “It is rensei” (rensei means training, drilling, formalising in Japanese).

Recently, I learnt many things from my work, however, I realised that I have rarely re-visited what I learnt; therefore, I sometimes forget how to solve a problem which I came across before. This series of post (hope it continues) will be memorandums for my work, in the area of data science, coding, and machine learning.

Image for post
Image for post
Rensei: drilling, training

EMI-1: Python. Comprehension and generator

# List
[i for i in range(10)]
# [0, 1, 2, 3, 4, 5, 6, 7, 8…


Image for post
Image for post
Photo by Markus Winkler on Unsplash

As a former electrical engineering student, my ‘go-to’ language has always been Matlab. Matlab is great for numerical analysis (including implementing deep learning models with recent updates); however, Matlab is not free.

During my undergraduate studies, I learnt Python. Python is one of the most popular programming languages, especially in the field of data science; it has many built-in functions and modules to facilitate data analysis.

In fact, my ‘go-to’ language has recently been shifting to Python. I am falling in love with Python. …

About

Takashi Nakamura, PhD

Data scientist and machine learning engineer. PhD in Signal Processing for Neuroscience. https://www.linkedin.com/in/takashi-nakamura-004875a6/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store