Nowadays I’ve been mostly coding in ruby at work and in my free time at agreelist.org, but I’m learning deep learning by means of Andrew Ng’s Coursera specialization and I love it.
Basically, a neural network (NN) is a box (function) which predicts a result:
For example, X is an image and Yhat is the prediction (cat or no cat).
In this case, X is a vector with all the pixels of the image:
Before being able to use it, we need to train the neural network with a group of images that are previously labeled as cat or not (training data). The training process finds a function that maps (X, Y) and provides the parameters W and b:
The training process is iterative:
We initialize W and b, use the NN to get the prediction and then we calculate the loss or error and we try to minimize. In order to do that, we calculate the partial derivatives, update the parameters and repeat the process. Each time we get new parameters W and b which hopefully converge.
If we zoom now the neural network, we see it’s composed of L layers of a series of nodes or units in each one which do a mathematical function and pass the result to all the units of the next layer:
As we need to train the NN with many images of cats, dogs and squirrels (m = number of training exaples), we are going to use a two-dimensional array for X so we save processing time:
The operations required to get the prediction Yhat are the following ones:
By the way, you can ask Google to plot an interactive graph of any function such as sigma:
Going back to our small summary, the dimensions of the arrays are:
Then we need a loss or error function that we’ll try to minimize. We could use the following one but it has multiple local optima:
Therefore we are going to use the following one:
And therefore the cost function is:
Now we need to calculate the partial derivatives to update the parameters W and b.
We update the parameters (alpha is the learning rate):
And the process starts again: