Deep Learning Introduction

By Hector Perez Arenas on September 16, 2017

Nowadays I’ve been mostly coding in ruby at work and in my free time at agreelist.org, but I’m learning deep learning by means of Andrew Ng’s Coursera specialization and I love it.

Basically, a neural network (NN) is a box (function) which predicts a result:

For example, X is an image and Yhat is the prediction (cat or no cat).

In this case, X is a vector with all the pixels of the image:

Before being able to use it, we need to train the neural network with a group of images that are previously labeled as cat or not (training data). The training process finds a function that maps (X, Y) and provides the parameters W and b:

The training process is iterative:

We initialize W and b, use the NN to get the prediction and then we calculate the loss or error and we try to minimize. In order to do that, we calculate the partial derivatives, update the parameters and repeat the process. Each time we get new parameters W and b which hopefully converge.

If we zoom now the neural network, we see it’s composed of L layers of a series of nodes or units in each one which do a mathematical function and pass the result to all the units of the next layer:

As we need to train the NN with many images of cats, dogs and squirrels (m = number of training exaples), we are going to use a two-dimensional array for X so we save processing time: