Nowadays, many companies take advantage of neural networks. They are used in recommendation systems, self driving cars, agriculture, smartphones and many other devices and technological sectors. But, the question is: how do they work and why are them a very important invention?

If you don’t know what a neural networks is I will explain it to you: they are computing systems inspired in biological brains but based on Calculus techniques. As our brains they have the ability to learn and decide in a intelligent way how a prototype or device must behave. Also our intelligence is enriched with different stimulus such as the sight. Artificial neural networks have also some inputs that the user must set to process and calculate the output decision or action.

For example, self driving cars have sensors that measure the distance to different obstacles. These data will be the input of the neural network. The output data will be the direction and acceleration of the car. The network will process the input data (this is what we will try to understand) and finally output how the car will move. The key, will be how we train the network and make it more intelligent. This is where we will take advantage of the calculus and in this example the network will know exactly how to process the data to avoid obstacles and follow the road.

In this post you will learn with a mathematical notation how do they perform and how do they learn internally.

Now we will start by knowing how a network is structured

We can define a neural network as a graph $G(V,A)$ where $V$ are the vertices of the graph and $A$ the arists.

In neural networks the vertices are called **neurons **and the arists, connections or **weights**. Also each neuron will have special weights called **bias**. In the following image we can appreciate the neurons and weights:

The graph of image 3 is conex and also is a digraph because every connection has a direction. Each layer is fully connected to the next (that means that every neuron of a layer is connected to every neuron of the next layer).

Also, the graph induced by two adyacent layers of neurons will be fully bipartite. If the numbers of neurons of a layer is $m$ and the number of the next layer is $n$, then we will have the bipartite subgraph $K_{m,n}$ and the number of connections between two layers will be $k=n \cdot m$

Each network will be structured in a **layered** architecture where each layer will contain neurons.

The topology of the neural network can be subdivided in three type of layers: **INPUT LAYER, HIDDEN LAYER AND OUTPUT LAYER**. The network will have only one input layer and one output layer. Each one with an specific number of neurons. However, the network can have more than one hidden layer. For instance, in picture 3 we have an input layer of 5 neurons, 2 hidden layers, one of 3 and another of 4 neurons and finally, the output layer of 2 neurons.

Every component in the network will contain values. The input layer neurons will store the data we want to analyze or use to calculate the prediction. Hidden and output neurons will calculate their values during the process of the update algorithm.

Each neuron will have two values separated by an **activation **function. The unactivated value will have the prefix $net$ and the value with the function applied will be $out$. To simplify notation sometimes we will be using $x_i^l$ instead of $out \ x_i^l$ where $x_i^l$ is a specific neuron in the layer $l$.

But how do we organize all the components in a mathematical structure?

In a multilayer neural network we can group neurons of a layer as a vector and the weights between 2 adyacent layers as a matrix. When we say input vector we are referring to the input neurons of the network. Same to hidden and output layer.

The vector of neurons will be denoted as $X_l$ where $l$ is the layer where the neurons are.

$$\begin{equation*}X_l=\begin{bmatrix}

x^l_1 \\\

x^l_2\\\ \vdots\\\ x^l_n

\end{bmatrix}\end{equation*}$$

We will denote an specific weight of the network as $w_{i,j}^l$ where there is a connection from neuron in position $i$ of layer $l$ to the neuron $j$ in layer $l+1$.

$$\begin{equation*}W_l=\begin{bmatrix}

w^l_{1,1} & w^l_{2,1} & \cdots& w^l_{n,1} \\\

w^l_{1,2} & w^l_{2,2} & \cdots & w^l_{n,2} \\\ \vdots & \vdots & \ddots &\vdots\\\ w^l_{1,m} & w^l_{2,m} & \cdots & w^l_{n,m}

\end{bmatrix}\end{equation*}$$

Also, we will have a vector of biases with the same size as the vector of neurons in the layer:

$$\begin{equation*}B_l=\begin{bmatrix}

b^l_1 \\\

b^l_2\\\ \vdots\\\ b^l_n

\end{bmatrix}\end{equation*}$$

A neural network can be undestood as a composition of functions that process some inputs and outputs the error. If we analyze the image below we can see that every fully connected layer in image 1 has a function $f$ or $g$. These applications have different parameters which are neuron values, weights and biases. Also there is the activation function between each layer and finally an error function to calculate how the network is performing.

In real life problems, neural networks have more neurons and layers. For example if we want to classify digits a widely used structure is 784 input neurons (image 28×28 pixels), 2 hidden layers with 20 neurons and finally the output layer with 10 neurons (possible solutions [0,1,2,3,4,5,6,7,8,9]).

With all the technical parts of a network we can start to explain how the network will operate.

The process to update a neural network is known as **feed-forward**. With the values of the input layer vector previously setted by user we will calculate all of the values of the rest of the neurons and also the error. The input values could be pixel colors, sensor data, sound data…

In reality, each neuron will store two values that will be different. One will be calculated with the other.

As you can see in picture 4, each neuron will have to states. It will have an activation function $\phi$ that process the value of the neuron. When the neuron is not activated it will have the notation $net$, nevertheless, when the neuron is activated with the function, it will have the notation $out$. Input vector won’t be activated because its values are setted by the user.

Now we will start with the update algorithm:

Between to adyacent layers we will define an application:

The graph induced by two adyacent layers will be a bipartite subgraph $K_{m,n}$ and it will have a linear application $f:\mathbb{R}^{(m+1)n+m}\rightarrow\mathbb{R}^{n}$. The layer function will have $m \cdot n + n + m$ variables. $m \cdot n $ to count the weights of the complete bipartite subgraph, $ n $ if we use biases and finally $m$ that are the neurons of the previous layer.

In the picture the applications of the layer are denoted as $f,g$.

Visually the function $f$ in our example will be structured as:

The vectorial function $f$ will be:

$$(net \ x_1^2 , net \ x_2^2)=\vec{f}(x_1^1,x_2^1,w_{11}^1,w_{12}^1,w_{21}^1,w_{22}^1,b_1^2,b_2^2)=(x_1^1w_{11}^1+x_2^1w_{21}^1 + b_1^2 , x_1^1w_{12}^1+x_2^1w_{22}^1+ b_2^2)$$

And $g$ will be:

$$(net \ x_1^3 , net \ x_2^3)=\vec{g}(x_1^2,x_2^2,w_{11}^2,w_{12}^2,w_{21}^2,w_{22}^2,b_1^3,b_2^3)=(x_1^2w_{11}^2+x_2^2w_{21}^2 + b_1^3 , x_1^2w_{12}^2+x_2^2w_{22}^2+ b_2^3)$$

We can appreciate that they are continuous in $\mathbb{R}^{(m+1)n+m}$ because it´s a linear function. The function is also diferentiable because the class is $C^{\infty}$

We have layer $L_1$ with the neurons ${x_1,^l,x_2^l,…,x_m^l}$ and $L_2$ with neurons ${x_1^{l+1},x_2^{l+1},…,x_n^{l+1}}$. Weights will have the notation $w_{i,j}^l$ when it connects $x_i^l$ and $x_j^{l+1}$. Biases will have the form $b_j^l$ associated to the neuron $x_j^l$ . To simpify notation $x_i^l = out \ x_i^l$ .Then, the update process of the network will be:

$$net \ x_j^{l+1}=\sum_{i=1}^{m}{x_i^lw_{ij}^l}+b_j^{l+1}$$

$$out \ x_j^{l+1}=\phi(net \ x_j^{l+1})$$

We can update a layer using the matrix form of a linear application. Layer $l+1$ has $m$ neurons and layer $l$, $n$ neurons:

$$net \ X_{l+1}=W_l \cdot out \ X_l+B_{l+1}=\begin{bmatrix}

net \ x^{l+1}_1 \\\

net \ x^{l+1}_2\\\ \vdots\\\ net \ x^{l+1}_m

\end{bmatrix}=\begin{bmatrix}

w^l_{1,1} & w^l_{2,1} & \cdots& w^l_{n,1} \\\

w^l_{1,2} & w^l_{2,2} & \cdots & w^l_{n,2} \\\ \vdots & \vdots & \ddots &\vdots\\\ w^l_{1,m} & w^l_{2,m} & \cdots & w^l_{n,m}

\end{bmatrix}\begin{bmatrix}

out \ x^l_1 \\\

out \ x^l_2\\\ \vdots\\\ out \ x^l_n

\end{bmatrix}+\begin{bmatrix}

b^{l+1}_1 \\\

b^{l+1}_2\\\ \vdots\\\ b^{l+1}_m

\end{bmatrix}$$

$$out\ X_{l+1}=\phi (net \ X_l)=\begin{bmatrix}

out \ x^{l+1}_1 \\\

out \ x^{l+1}_2\\\ \vdots\\\ out \ x^{l+1}_m

\end{bmatrix}=\phi \left(\begin{bmatrix}

net \ x^{l}_1 \\\

net \ x^{l}_2\\\ \vdots\\\ net \ x^{l}_m

\end{bmatrix}\right)$$

Each layer will be updated with the previous layer output. When the network execute the $f$ function it will obtain $x_1^2$ and $x_2^2$ values so we will be able to execute $g$ with that values previously obtained.

These applications are diferenciable because they are of class $C^{\infty}$ and also their are linear applications. Therefore, we can use the powerful tools of diferentiable Calculus.

The logistic function is widely used for learning curves so in this example we will be using it. For more activation functions see: Activation functions

$$\phi(x)=\frac{1}{1+e^{-x}}$$

**Example of the update process**

First of all we have the weights setted, the input layer values which are 0.5 and 0.25 and the bias is 0 at the initial state.

We execute the update process of the first layer with the formula. The red arrows apply the feedforward process activation function.

$$net \ x_1^2= 0.5 \cdot 0.25 + 0.25 \cdot 0.1 + 0 = 0.15 \quad out \ x_1^2=\frac{1}{1+e^{-0.15}}=0.53$$

$$net \ x_2^2= 0.5 \cdot 0.35 + 0.25 \cdot (-0.3) + 0 = 0.1 \quad out \ x_2^2=\frac{1}{1+e^{-0.1}}=0.52$$

Then with the values of $out \ x_1^2 $ and $out \ x_2^2$ we can do a step forward and calculate the values of $x_1^3$ and $x_2^3$

$$net \ x_1^3= 0.53 \cdot 0.9 + 0.52 \cdot 0.5 + 0 = 0.74 \quad out \ x_1^3=\frac{1}{1+e^{-0.74}}=0.68$$

$$net \ x_2^3= 0.53 \cdot (-0.7) + 0.52 \cdot 0.3 + 0 = -0.21 \quad out \ x_2^3=\frac{1}{1+e^{0.21}}=0.45$$

With the feed-forward process we have calculated every value of every neuron. Now we need to calculate the error that the network has done to learn.

If we have datasets to train, we can calculate the **error **the network has done to an specific data point. Imagine that the dataset contains different data each one with the input values we want to give the network and also the output solution the network should return.

For example the input data in our example network is $[x_1^1=0.5, x_2^2=0.25]$ and the desired output data we cant the network to return is $[d_1=0, d_2=1]$. Clearly, the output solution of the network hasn’t been accurate $[x_1^3=0.68, x_2^3=0.45]$.

We want to measure the error the network have had. A technique the networks have is to implement a **cost function** that measures the difference between two vectors (desired and real output) . In this case we will be using the Mean Squared Error (MSE):

With the desired values we will calculate the differences between the real value and the desired.

$$E(x_1^3,x_2^3)=\frac{(d_1-x_1^3)^2+(d_2-x_2^3)^2}{2}=\frac{(-0.68)^2+(0.55)^2}{2}=0.38$$

The general form of the error is where $x_j$ is the neuron in position $j$ of the output layer neurons and $d_j$ the desired value in that position.

$$E(\vec{x})=\frac{\sum_{j=1}^{n}{(d_j-x_j^L)^2}}{n}$$

The size of the output layer is $n$.

If we want it in a matrix form denoting the output vector as $X$. Also, $Y$ will be the desired vector and $n_L$ the number of neurons in output layer:

$$E(X_L)=\frac{1}{n_L}(Y-X_L) \cdot (Y-X_L)^t$$

The previous network was randomly chosed. But what if we want to calculate the weights to fit to a problem a not chosen them randomly?. In other words, why if we adjust every weight and bias in the network to decrease the error. This is the general idea of the training process of a neural network. We want to update every weight and bias to improve the performance of the network. This is not easy and requires Calculus to solve this problem.

When the network is created we must initialize randomly the weights to start at a point. A good way to initialize the weights is with a normal distribution function between [-1,1]. Biases are usually initialize with a zero value but the can also be intialized randomly

The problem comes when we want to optimize the weights and biases to reduce the error. These elements are a type of regulators the network must adjust to decrease the cost function..

We can think of an optimization problem where we want to minimize the error. However, is important to don’t overfit the neural network. In other terms, we dont want to decrease the error a lot because in an iteration we are learning with a certain data but the network must learn about the entire dataset. There are a lot of waysto prevent overfitting. One of them is to have a learning rate.

But in a mathematical way how do we adjust every weight to reduce the error. Thanks to diferentiable Calculus it’s possible. In neural network the process to adjust every weight is known as **back-propagation **and it will process the network in the opposite direction as the feed-forward

Gradient descent is a great tool to reduce error in a multidimensional space and with composition of functions. We are questioning about:

$$\frac{\partial E(\vec{x})}{\partial w^l_{ij}}$$

Applying the chain rule we can calculate every weight in the network. In the example network we have 8 weights to adjust.

The way we calculate the derivatives between one layer and another will be different because of the chain rule. First layers in network will have more operations to calculate their derivatives

$$\frac{\partial E(\vec{x})}{\partial w^{L-1}_{ij}}=\frac{\partial E}{\partial out \ x^{L}_{j}}\frac{\partial out \ x^L_{j}}{\partial net \ x^L_{j}} \frac{\partial net \ x^L_{j}}{\partial w^{L-1}_{ij}}$$

If we calculate the derivatives knowing that we are using the logistic function and MSE error:

$$\frac{\partial E(\vec{x})}{\partial w^{L-1}_{ij}}=\frac{-2(d_j-x_j^L)}{n}\phi ‘ (net \ x_j^L) out\ x_i^{L-1}$$

The first term is the derivative of the MSE error and the last term the derivative of the feed-forward function. Now, if $\phi$ is the logistic function then:

$$\phi ‘ (net \ x_i^l)=\phi(net \ x_i^l)(1-\phi(net \ x_i^l))=out\ x_i^l(1-out\ x_i^l)$$

We can define a term that is the** error signal** $\delta _i^l$ that will be stored in each neuron and will accumulate the chain rule derivatives in that specific layer. These will help to calculate the derivatives in the other layers.

$$\delta_j^l=\frac{\partial E}{\partial net\ x^{l}_{j}}=\frac{\partial E}{\partial out \ x^{l}_{j}}\frac{\partial out \ x^l_{j}}{\partial net \ x^l_{j}}$$

So the formula to calculate the weights will be:

$$\frac{\partial E(\vec{x})}{\partial w^{L-1}_{ij}}=\delta_j^L \frac{\partial net \ x^L_{j}}{\partial w^{L-1}_{ij}}$$

For example if we calculate every partial derivative of a weight of $W_2$ with respect to $E$ will be:

$$\frac{\partial E(x_1^3,x_2^3)}{\partial w_{11}^2}=\frac{-2(0-0.68)}{2}(0.68)(1-0.68)0.53=0.078$$

$$\frac{\partial E(x_1^3,x_2^3)}{\partial w_{21}^2}=\frac{-2(0-0.68)}{2}(0.68)(1-0.68)0.52=0.077$$

$$\frac{\partial E(x_1^3,x_2^3)}{\partial w_{12}^2}=\frac{-2(1-0.45)}{2}(0.45)(1-0.45)0.53=-0.07$$

$$\frac{\partial E(x_1^3,x_2^3)}{\partial w_{22}^2}=\frac{-2(1-0.45)}{2}(0.45)(1-0.45)0.52=-0.07$$

We can also represent all the derivatives in the Jordan matrix $J$:

$$\frac{\partial E}{\partial W_{L-1}}=\begin{bmatrix}

\frac{\partial E}{\partial w^{L-1}_{1,1}} & \frac{\partial E}{\partial w^{L-1}_{2,1}} & \cdots& \frac{\partial E}{\partial w^{L-1}_{n,1}}\\\

\frac{\partial E}{\partial w^{L-1}_{1,2}} &\frac{\partial E}{\partial w^{L-1}_{2,2}} & \cdots & \frac{\partial E}{\partial w^{L-1}_{n,2}} \\\ \vdots & \vdots & \ddots &\vdots\\\ \frac{\partial E}{\partial w^{L-1}_{1,m}}& \frac{\partial E}{\partial w^{L-1}_{2,m}} & \cdots &\frac{\partial E}{\partial w^{L-1}_{n,m}}

\end{bmatrix}=\left(\begin{bmatrix}

\frac{\partial E}{\partial out \ x^{L}_{1}} \\\

\frac{\partial E}{\partial out \ x^{L}_{2}}\\\ \vdots\\\ \frac{\partial E}{\partial out \ x^{L}_{n}}

\end{bmatrix}\odot\begin{bmatrix}

\frac{\partial out \ x^{L}_{1}}{\partial net \ x^{L}_{1}} \\\

\frac{\partial out \ x^{L}_{2}}{\partial net \ x^{L}_{2}}\\\ \vdots\\\ \frac{\partial out \ x^{L}_{n}}{\partial net \ x^{L}_{n}}

\end{bmatrix}\right)\begin{pmatrix}

\frac{\partial net \ x^{L-1}_{1}}{\partial w^{L-1}_{.j}}&

\frac{\partial net \ x^{L-1}_{2}}{\partial w^{L-1}_{.j}}& \cdots&& \frac{\partial net \ x^{L-1}_{m}}{\partial w^{L-1}_{.j}}

\end{pmatrix}$$

If we want to express it with the error signal $\delta$ :

$$\frac{\partial E}{\partial W_{L-1}}=\delta _{L} \cdot (X_{L-1})^T=\left(\begin{bmatrix}\frac{-2(d_1-x_1^L)}{n} \\\frac{-2(d_2-x_2^L)}{n}\\ \vdots \\ \frac{-2(d_n-x_n^L)}{n}\end{bmatrix}\odot\begin{bmatrix}\phi'(net \ x_1^{L}) \\ \phi'(net \ x_2^{L})\\ \vdots \\ \phi'(net \ x_n^L)\end{bmatrix}\right)\begin{pmatrix}out \ x_1^{L-1}&out \ x_2^{L-1}&\cdots&out \ x_n^{L-1}\end{pmatrix}$$

Now we will learn how to calculate the derivatives with respect to each bias:

$$\frac{\partial E}{\partial b^L_{j}}=\frac{\partial E}{\partial out \ x^L_{j}}\frac{\partial out \ x^L_{j}}{\partial net \ x^L_{j}}\frac{\partial net \ x^L_{j}}{\partial b^L_{j}}$$

The last derivative if one because the bias neuron has always a value of 1.

$$\frac{\partial E}{\partial b^L_{j}}=\frac{\partial E}{\partial out \ x^L_{j}}\frac{\partial out \ x^L_{j}}{\partial net \ x^L_{j}}\cdot 1=\delta _j^L $$

Then the derivatives will be:

$$\frac{\partial E}{\partial b^L_{j}}=\frac{-2(d_j-x_j^L)}{n}\phi ‘ (net \ x_j^L) $$

In a matrix form

$$\frac{\partial E}{\partial B_{L}}=\begin{bmatrix}

\frac{\partial E}{\partial b^L_1} \\\

\frac{\partial E}{\partial b^L_2}\\\ \vdots\\\ \frac{\partial E}{\partial b^L_n}

\end{bmatrix}=

\delta _L=\left(\begin{bmatrix}

\frac{\partial E}{\partial out \ x^{L}_{1}} \\\

\frac{\partial E}{\partial out \ x^{L}_{2}}\\\ \vdots\\\ \frac{\partial E}{\partial out \ x^{L}_{n}}

\end{bmatrix}\odot\begin{bmatrix}

\frac{\partial out \ x^{L}_{1}}{\partial net \ x^{L}_{1}} \\\

\frac{\partial out \ x^{L}_{2}}{\partial net \ x^{L}_{2}}\\\ \vdots\\\ \frac{\partial out \ x^{L}_{n}}{\partial net \ x^{L}_{n}}

\end{bmatrix}\right)$$

Let’s continue with the example

$$\frac{\partial E(x_1^3,x_2^3)}{\partial b_1^3}=\frac{-2(0-0.68)}{2}(0.68)(1-0.68)=0.15$$

$$\frac{\partial E(x_1^3,x_2^3)}{\partial b_2^3}=\frac{-2(0-0.45)}{2}(0.45)(1-0.45)=-0.13$$

In the rest of the layers to calculate the derivatives of the weights and biases we will continue applying the chain rule. However to optimize the network we will be using the derivatives previous calculated.

$$\frac{\partial E}{\partial out \ x^l_j}=\sum_{i=1}^n{\frac{\partial net \ x_i^{l+1}}{\partial out \ x^l_j}\frac{\partial E}{\partial net \ x^l_j}}=\sum w_{ji}^l \delta _j^l$$

We will use the error signal as $\delta$ which will be associated to every neuron in a layer and it will store its derivative

$$\delta _j^l=\frac{\partial E}{\partial net \ x^l_{j}}=\frac{\partial E}{\partial out \ x^l_{j}}\frac{\partial out \ x^l_{j}}{\partial net \ x^l_{j}}=\left(\sum {w_{ji}^l \delta _j^{l+1}}\right) \phi´(net x_j^l)$$

So now we can calculate the derivative with respect to every weight:

$$\frac{\partial E}{\partial w_{ij}^l}=\frac{\partial E}{\partial net \ x^l_{j}}\frac{\partial net \ x^l_{j}}{\partial w_{ij}^l}=\delta _j^l x_i^{l-1}$$

With all the derivatives calculated we can now update the weights and biases. We want to decrease the error (minimize). $\eta$ will be the learning rate previosly mentionated.

$$w_{ij}^l=w_{ij}^l-\eta\frac{\partial E(\vec{x})}{\partial w^l_{ij}}$$

The negative sign is because we want to decrease the error.

Another way to understand the optimization is with the gradient vector.

$$w_{.j}^l=w_{.j}^l-\eta\nabla(f_i)$$

If we update the biases:

$$b_{j}^l=b_{j}^l-\eta\frac{\partial E(\vec{x})}{\partial b^l_{j}}$$

If we use matrixs:

$$W_l=W_l-\eta J(\vec{f})=\begin{bmatrix}

w^l_{1,1} & w^l_{2,1} & \cdots& w^l_{n,1} \\\

w^l_{1,2} & w^l_{2,2} & \cdots & w^l_{n,2} \\\ \vdots & \vdots & \ddots &\vdots\\\ w^l_{1,m} & w^l_{2,m} & \cdots & w^l_{n,m}

\end{bmatrix}-\eta\begin{bmatrix}

\frac{\partial E}{\partial w^l_{1,1}} & \frac{\partial E}{\partial w^l_{2,1}} & \cdots& \frac{\partial E}{\partial w^l_{n,1}}\\\

\frac{\partial E}{\partial w^l_{1,2}} &\frac{\partial E}{\partial w^l_{2,2}} & \cdots & \frac{\partial E}{\partial w^l_{n,2}} \\\ \vdots & \vdots & \ddots &\vdots\\\ \frac{\partial E}{\partial w^l_{1,m}}& \frac{\partial E}{\partial w^l_{2,m}} & \cdots &\frac{\partial E}{\partial w^l_{n,m}}

\end{bmatrix}$$

$$B_l=B_l-\eta (\nabla(\phi _l)^t)=\begin{bmatrix}

b^l_1 \\\

b^l_2\\\ \vdots\\\ b^l_n

\end{bmatrix}-\eta \begin{bmatrix}

\frac{\partial E}{\partial b^l_{1}} \\\

\frac{\partial E}{\partial b^l_{2}}\\\ \vdots\\\ \frac{\partial E}{\partial b^l_{n}}

\end{bmatrix}$$

With the feed-forward and back-propagation techniques the network can learn a certain dataset and return accurate results.

We can update the parameter of the network in different moments. The default way is to adjust them on each data element iteration. This is called Stochastic Gradient Descent (SGD). However it has been proved that if we specify a minibatch size (32,64,128,…) and adjust the parameters when the minibatch has been iterated it will have a better performance in the optimization of the network training process.

The **training process **will have some step the network should do

**Create architecture of network**

The network layers, parameters and topology must be set

**Initialize**

The network should initialize all parameters including the weights and biases. We can use for the weights a random initialization between [-k,k] (k is recommended to be 1) or a normal random distribution. The bias are usually set to zero.

**Load dataset**

We need to load the dataset and store it in RAM. However if the dataset is to large we can split it in different batches to reduce the memory used.

We need to calculate the minibatch size depending on the size of minibatch we want to use and the dataset size.

**For each epoch****Shuffle data of dataset**(randomize the position of the data)

**For each minibatch****For each****iteration in minibatch**:- Assign the inputs of dataset in that iteration position to the input layer
- Apply the feed-forward function to calculate outputs
- Calculate error (optional)
- Back-Propagation using the desired output in the dataset in that iteration position. This will calculate the derivatives of each weight and bias

- Adjust with the gradient descent technique each weight and bias (weights and biases derivatives will be accumulated in a matrix for each iteration) and finally divided by the minibatch size.
- Reset gradient matrixs (set to zero)

**Optional**: we can also graph the error to see the performance in a 2D graph.

In this case we will have another dataset different from the training dataset to test the network.

**For each****iteration in dataset**:- Assign the inputs of dataset in that iteration position to the input layer
- Apply the feed-forward function to calculate outputs
- Calculate error with outputs and desired values of dataset
- Accumulate error (optional)

**Stadistics**- Output the mean error dividing the accumulate error by the dataset size

When the network has been trained successfully with a low error. We can start to take advantage of it and without using desired outputs values. In this moment we don´t know the output and we want the network to predict it.

To give a prediction of the network:

- Assign the inputs to analyze to the input layer
- Apply the feed-forward function to calculate outputs

With this post you will know how a neural network work internally and the maths behind it. You will now be able to use AI libraries and apply the feed-forward neural network or implement your own neural network from scratch.

I hope this post have helped to you. In that case you can give feedback.

]]>Taylor series of a function $f(x)$ centered in a is:

$ f(x)=P(a)=\displaystyle\sum_{n=0}^{\infty} \frac{f^{(n)}(a)}{n!}(x-a)^n $

However, computers can’t do an infinit sum so the taylor series will be an aproximation of a determined function

$ f(x) \approx P(a,n)=f(a)+f'(a)(x-a)+\frac{f”(a)}{2}(x-1)^2+…+\frac{f^{(n)}}{n!}(x-1)^n \quad n\in{ \mathbb{N}}$

When $a=0$ the series are called Mac-Laurin series

It’s also very important to manage the error of the taylor series and we can calculate it with the Lagrange theorem where $b$ is the x point we want to calculate the error

$|ERROR|\leq sup{\frac{f(c)^{(n+1)}}{(n+1)!}(b-a)^n} \quad c\in(a,b)$

I have developed a simulation about the taylor series where you can visualize the taylor series of different function such as cosine, sine, logarithm and exponential.

]]>Therefore, this is done with complex algorithms that are based on the **convolutional operation**. This operation consists on editing a feature image with some **filters **(kernels) that will be applied to get a transformed image.

The input image will be an **RGB image** but it can also have one channel (black and white image). This will vary the depth of the RGB image and also the channel. Also RGB images can be resized with a depth of 1 depending on their amount of colour.

Each filter will be a three dimensional array (width, height, depth) with some values.

These values will vary depending on the purpose. These values can be asigned in two different ways depending on their used

**Predifined Filters****Learning filters**for Machine Learning (Convolutional Neural Networks)

Now we will analyze each of them

This type of filters have been used for a long time with respect to the modern Convolutional Neural Networks in machine learning. They are very useful to reconstruct and proccess and image with different patterns.

Here I will saw some examples of filters for differents finalities:

With this filter we can apply the convolution operation to get the most important pixels that represent the edges of the image.

This is a popular effect that create cool effects in images. It is usually combined with Artificial Intelligence algorithms to blur the background and not the front object or person. Also commonly known as Portrait Mode

The blur effects also have a predifined filter:

Finally we have the sharpenss effect that is the opposite to the blur:

Convolutional Neural Networks are widely used for complex problems of image classificiation and detection. With some layers we can get output values that will be the prediction of the network. This network contain also convolutional layers. This layer have also filters that will have been initialized randomly. However with the backpropagation training method we will update these filters to obtain better filters that simplify the images and remark the most important parts of them.

]]>When we train the network we can now if the gradient descent is working well or not analyzing the error.

The error is very important to analyze the performance of our neural model. We have two main formulas to calculate the error.

The error will be calculated with the sum of the output neuron values in different ways:

**Mean Squared Error (MSE)**

$$E(x)=\frac{1}{2}\sum_o{(x-d)^2}$$

Where $x$ is the obtained value and $d$ the desired value.

The square pow is only used to have only positive values. We can also you the absolute valor of the difference:

$$E(x)=\frac{1}{2}\sum_o{\mid x-d\mid}$$

**Cross Entropy**

$$E(x)=\sum_o{d\log{x}}$$

Also depending on the learning we apply to the model, we will have three posibilities:

**UNDERFITTING**: the model is not accurate and is very flexible. When we train with a dataset, the error is always high. This happens when the**learning rate**of the network is very low.**OVERFITTING**: this situation comes up when we try in a very accurate way. We learn with a high learning rate almost 1 so we will decrease the full gradient in one training set**DESIRED:**the best way to learn is the middle term between the two situations explained before. We want to be flexible with the training sets to learn from everyone but also to try to decrease the error remarkably.

Also we can understand the behavior of the network with the error function during the epochs. When we are training we will have a function that decreases (**training loss**). In the test of the neural network (**valid loss**) we will have another function. If they are very similar like the example here the network will have a good performance.

The fluctuations of the error usually mean a good factor of the performance of the network. Also the error should decrease during the time passed (epochs)

To get very similar result of the two functions valid loss and training loss we can use different optimization algorithms:

**Gradient Descent Variants****Gradient Descent Optimization Algorithms**

The first type were explained in the Feed-Forward Neural Network Tutorial where the mini-batch, batch and stochastic gradient descents were analyzed.

We have a great variety of optimization algorithms. Some of them increase the speed of approach to the minimum, other have a dynamic learning rate to reach the best minimum and other have both of them.

In this animation you can watch the different optimization algorithms and which ones reach the best minimum and the epochs it take to each one.

The gradient descent value calculated in the back-propagation will be $g_t$ that will be equivalent to all of the $\frac{\partial E}{\partial w}$ where $E$ is the error. $\theta$ will be the matrix with all the weights.

This optimization algorithm helps to accelerate the Stochastic Gradient Descent in the relevant direction, reducing the oscillations in the wrong directions and stimulating the direction to the minimum.

$$v_t=\lambda v_{t-1}+\mu g_t$$

Finally to update each weights:

$$\theta =\theta – v_t$$

The momentum term $\lambda$ will be 0.9 or a similar value.

With the momentum the descent is following the slope blindly. We want to use smarter movements so when the error is approaching to the minimum the speed will decrease.

$$v_t=\lambda v_{t-1}+\mu g_t(\theta-\lambda v_{t-1})$$

]]>There are a lot of different activation functions that could be used for different purposes.

An activation function is used to separate active and inactive neurons depending on a rule or function. Depending on their status they will modify their values or not.

We will see a wide range of activation functions:

- Identity
- Binary Step
- Logistic (Soft Step)
- Hyperbolic Tangent
- ReLU
- ELU
- SoftPlus
- SoftMax

This activation function will not produce any change to the input value. The output will be the same.

$$f(x)=x$$

The derivative of the identity is:

$$f'(x)=1$$

It’s a simple fragmented function.

$$f(x)= \left\{ \begin{array}{lcc}

0 & if & x < 0 \\

\\ 1 & if & x \geq 0

\end{array}

\right.$$

This function will be a step function but with an inferior and superior limit. This function is widely used for **Feed-Forward Neural Networks**

$$f(x)=\frac{1}{1+e^{-x}}$$

The derivative of the logistic function is:

$$f'(x)=f(x)(1-f(x))$$

It’s very similar to the logistic function but it will tend to -1 in the $-\infty$ and to 1 in $\infty$ .

$$f(x)=\frac{2}{1+e^{-2x}}-1$$

The derivative of the hyperbolic tangent is:

$$f'(x)=1-f(x)^2 $$

This is used almost every network. In the negative X axis the result will be 0 and in the right positive X axis the value will be mantained

$$f(x)= \left\{ \begin{array}{lcc}

0 & if & x < 0 \\

\\ x & if & x \geq 0

\end{array}

\right. $$

We have also one variant of this function called **PReLU**:

$$f(x)= \left\{ \begin{array}{lcc}

\alpha x & if & x < 0 \\

\\ x & if & x \geq 0

\end{array}

\right.$$

Where $\alpha$ is a parameter to change the slope of the function

Similar to ReLU but with an exponential curve in the negative X axis.

$$f(x)= \left\{ \begin{array}{lcc}

\alpha(e^x-1) & if & x < 0 \\

\\ x & if & x \geq 0

\end{array}

\right. $$

Where $\alpha$ is a parameter to edit the curve. It’s recommended a value similar to 1

In this function the Y value always increases.

$$f(x)=\log_e{(1+e^x)}$$

Finally we have the softmax function that is mainly used for the output layers specially in the Convolutional Neural Network.

It’s explained in the Convolutional Neural Network tutorial in the Output layer section.

To get more information about activation functions I recommend this tutorial: https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6

]]>Apart from the effects of the simulation, this autonomous simulation has a lot of complex algorithms and neural networks behind it.

It’s developed in C# and Unity 3D and has some keys ideas to make this work. You can also learn some of the basic ideas and create a simple Self Driving Simulation in 3D here.

This are the most important key ideas for creating the project:

- Track System
- Car Sensors
- Neural Network
- Output movements (acceleration and rotation)
- Training: back-propagation and genetic algorithm
- Car DNA and JSON importer
- Environment and camera

The track of the demo is made in Blender with bezier curves. In Unity 3D is a mesh that has a collider component.

When the track is imported from Unity, some track points are added in game. These points will be in charge of localizing each car. They will be in the entire track.

Now we will understand how a car is tracked:

The track systems is similar to a GPS localization and will help to create navigation systems as a new feature of the application.

- We take the initial trackpoint where the car has been spawned. In this case denoted with the colour purple. With the next two points we will be able to trilaterate the position and calculate the point where the three points are with the same distance (center of a circunference).
- This centers will be calculated at the start of the simulation with the track points.

```
float v1 = x1 * x1 + y1 * y1;
float v2 = x2 * x2 + y2 * y2;
float v3 = x3 * x3 + y3 * y3;
float a = x1 * (y2 - y3) - y1 * (x2 - x3) + x2 * y3 - x3 * y2; ;
float b = v1 * (y3 - y2) + v2 * (y1 - y3) + v3 * (y2 - y1);
float c = v1 * (x2 - x3) + v2 * (x3 - x1) + v3 * (x1 - x2);
float xCenter= -b/(2f*a);
float yCenter = -c / (2f * a);
```

Where x and y are the positions of the track points selected.

When all the center points have been calculated, on each update of the simulation we will calculate the intersection point of the line that joins the car and the center with the line of the current track point and the next point.

```
public Vector2 intersection(float x1, float y1, float x2, float y2, float x3, float y3, float x4, float y4)
{
float pxN = (x1 * y2 - y1 * x2) * (x3 - x4) - (x1 - x2) * (x3 * y4 - y3 * x4);
float pyN = (x1 * y2 - y1 * x2) * (y3 - y4) - (y1 - y2) * (x3 * y4 - y3 * x4);
float pD = (x1-x2)*(y3-y4)-(y1-y2)*(x3-x4);
return new Vector2(pxN / pD, pyN / pD);
}
```

With the intersection calculated, the initial and current track point, we will calculate the distance.

The distance in the image will be the red path.

Each car will have some back and front sensors that will be the equivalent of the real LiDAR sensors. Each sensor will raycast the distance to the first object that could be a track or another car.

The output of the sensor will be the max distance of it divided by the collision point distance from the car. The output will be a value from [0-1]. If the sensor haven’t detected any object it will return 1.

This are the car sensors parameters in the editor:

Each car will have a Feed-Forward Neural Network. This network will have a lot of parameters to edit such as the architecture or the initialization range.

This architecture is similar to the neural network of a car. It will have as input the sensor data. Each sensor will have one neuron value if it only detects the track. However if the simulation also detect car collisions, the input data of a sensor will be two for each. The output will be always 2 because of the acceleration and rotation of each car. The hidden layers will vary and will be able to be customized by the user.

The architecture of the hidden layers can be edited:

```
[Header("Architecture Network")]
public int[] hiddenArchitecture;
```

Also the activation function can be selected from the sigmoid to the hyperbolic tanh and RELU:

```
[Header("Activation")]
public Activation activationFunction = Activation.Sigmoid;
```

The values of the weights, bias, neurons and error will be stored in different array:

```
//Arrays
private List<float[]> neurons;
private List<float[,]> weights;
private List<float[]> bias;
private List<float[]> deltas;
private List<float[,]> gradient;
private List<float[,]>previousVariation;
```

The Neural Network class will have the feed-forward update method that will be charge of calculating the outputs with the input data and the weights values.

This is the feed-forward algorithm implemented in code:

```
for (int i = 0; i < neurons[0].Length; i++)
{
neurons[0][i] = inputs[i];
}
for (int i = 0; i < getLayers() - 1; i++)
{
float[,] weightsLayer = weights[i];
float[] layerNeurons = neurons[i];
float[] layerNextNeurons = neurons[i + 1];
for (int j = 0; j < layerNextNeurons.Length; j++)
{
float sum = 0;
for (int k = 0; k < layerNeurons.Length; k++)
{
sum += (weightsLayer[k, j] * layerNeurons[k]+bias[i][k]);
}
layerNextNeurons[j] = applyActivationFunction(sum);
}
}
```

Each car depending on the outputs will accelerate and deccelerate and also steer. Each car will be able to move in every direction and with different speeds. Also cars will have a rigid body to trigger the collisions with other objects ingame.

Depending on the values from [0-1] of the output layer we will need to manipulate them to get right and left rotations and also have gradual speeds. The simulation also helps the cars to have a speed limit.

```
//Rotation
float rotation = outputs[0] * 2 - 1;
transform.Rotate(new Vector3(0, (netConstantRotation * rotation)*Time.deltaTime, 0));
//Acceleration
if (applyAcceleration)
{
float accelerate = outputs[1] * 2 - 1;
if ((speed) >= maxSpeed)
{
if (accelerate >= 0)
{
accelerate = 0;
}
}
else if ((speed) <= minSpeed)
{
if (accelerate <= 0)
{
accelerate = 0;
}
}
acceleration += netConstantAcceleration * accelerate;
}
```

Each car will have two different modes. The autonomous mode where the movements will be predicted by the neural network or the manual mode. In this mode the car will be steered by the WASD control of the keyboard. This will help the car to learn of the movements of the car. We will be seeing this in the Supervised Learning.

This is the most important part. When the neural network is initialized it will be completely random and the weights will need to be modified to get nice movements of the car and prevent the car from colliding.

There are different of artificial intelligence learning algorithms. In this simulation the Supervised Learning and the Reinforcment Learning are implemented.

One strategy for the car to learn is to use the Evolution Law of Darwin. You can get a deep explanation of this in this post of Genetic Algorithm. The overall idea is to get better individuals with different methods. The weights in this type or learning are called DNAs. This DNA will have all the important information of the car (the neural network learnable parameters).

In this image you can see the steps of the Genetic Algorithms to create new child cars from the parents (best cars):

The best cars must be selected with the accuracy and the diversity of each car. The accuracy will be proportional to the distance travelled calculated in the track system of the car. With the accuracy we will calculate the fitness of each car that will help us to calculate the best cars.

Then a DNA some operation will be executed to create new childs that share DNA of their parents and also have little mutation changes.

Here you can change the code of the selection of the cars:

```
GameObject[] highestCars = new GameObject[selectedCount];
if (controller.cars.Count > 0)
{
for (int a = 0; a < selectedCount; a++)
{
//Get highest fitness of the cars that are not selected
GameObject carMaxFitness = null;
for (int i = 0; i < controller.cars.Count; i++)
{
if (!controller.cars[i].GetComponent<CarController>().selected)
{
if (carMaxFitness != null)
{
//If a car has better fitness swap and set selected
if (controller.cars[i].GetComponent<CarController>().getFitness() > carMaxFitness.GetComponent<CarController>().getFitness())
{
carMaxFitness.GetComponent<CarController>().selected = false;
controller.cars[i].GetComponent<CarController>().selected = true;
carMaxFitness = controller.cars[i];
}
}
else
{
controller.cars[i].GetComponent<CarController>().selected = true;
carMaxFitness = controller.cars[i];
}
}
}
highestCars[a] = carMaxFitness;
}
for (int j = 0; j < highestCars.Length; j++) {
selectedCars[j] = highestCars[j];
}
for (int i = 0; i < controller.cars.Count; i++)
{
if (!controller.cars[i].GetComponent<CarController>().selected && !controller.cars[i].activeSelf)
{
Destroy(controller.cars[i]);
controller.cars.Remove(controller.cars[i]);
}
}
for (int i = 0; i < controller.cars.Count; i++)
{
controller.cars[i].GetComponent<CarController>().selected = false;
}
}
```

This is another way to learn. In this case with gradient descent algorithms. This is a much complex learning because it will try to learn with the movements of a human drived car. Because the Gradient Descent is a multidimensional problem, this learning won’t give us the assurance that it will give always nice trainings. It will also depend on the error we made with the control of the car.

The idea of the back-propagation algorithm is to edit the weights of the neural network to decrease the general error of the output data of the network.

This error will be calculated with the movements of the human controlled car. Then we will execute the Gradient Descent Algorithm to try to decrease this error.

Here you can see the calculation of the error signal:

```
//Output Layer error signal
float[] outputs = getOutputs();
for(int i = 0; i < outputs.Length; i++)
{
float delta = -(desired[i] - outputs[i])*applyDerivativeActivationFunction(outputs[i]);
deltas[getLayers() - 1][i] = delta;
}
//Hidden layers error signal
for (int i = getLayers() - 2; i >= 0; i--)
{
for (int j = 0; j < neurons[i].Length; j++)
{
float sumDelta = 0;
for (int k = 0; k < neurons[i + 1].Length; k++)
{
sumDelta += deltas[i + 1][k] * weights[i][j,k];
}
float delta = sumDelta * applyDerivativeActivationFunction(neurons[i][j]);
deltas[i][j] = delta;
}
}
//Calculate gradient with this error signals.
calcGradients();
```

The adjustment of the weights of the neural network with this error signal is implemented here:

```
public void adjustWeights()
{
//Adjust weights and biases
for (int i = 0; i < weights.Count ; i++)
{
for (int j = 0; j < neurons[i + 1].Length; j++)
{
//bias[i][j] -= biasLearningRate * deltas[i + 1][j];
for (int k = 0; k < neurons[i].Length; k++)
{
float variation = weightLearningRate * gradient[i][k, j];
weights[i][k, j] -=variation;
}
}
}
//Reset array of previous gradients
gradient = createGradientArray();
}
```

To improve the performance of this algorithms, I have implemented some optimization algorithms: minibatch gradient descent and momentum.

To get better results with this algorithm, the initialization of the weights and how are they has a lot of importance for the movement to a local minimum of the Gradient Descent. It’s important firstly to get cars that works well in the track with the Genetic Algorithm learning and then get accurate results of the movements with the backpropagation.

The DNA contains all the weight data of the neural network. This DNA can be written to a JSON file to analyze this weights and also the accuracy of the cars and other variables. Also it will help to use a DNA to create the cars in another simulation.

In this fragment of code you can see how to import and export the DNA:

```
public void exportJson(DNA dna)
{
string s = JsonUtility.ToJson(dna);
Debug.Log(s);
using (StreamWriter streamWriter = File.CreateText(Path.Combine(Application.persistentDataPath, fileName)))
{
streamWriter.Write(s);
}
}
public DNA importJson()
{
using (StreamReader streamReader = File.OpenText(Path.Combine(Application.persistentDataPath, fileName)))
{
string jsonString = streamReader.ReadToEnd();
return JsonUtility.FromJson<DNA>(jsonString);
}
}
```

View of DNA in chart:

The scene has different game objects apart from the cars: the track and the camera as the main objects.

For the visual effects, the simulation includes glow effects, post proccessing, lighting and shaders.

The camera follows the best car. You can turn this functionality off by locking the camera.

**INTRUCTIONS FOR DOWNLOAD AND RESEARCH**

The project will be able to download soon.

**KEYS**

**Tab **– Switch from autonomous mode to manual mode

**R/T** manage lighting

**WASD **manual car movement

**Shift**– Genetic algorithm generation

**Control **– Random generation

**C **– Lock/unlock camera

**Z**– Switch car

**F** – Save DNA in a JSON file in Appdata

Self Driving Cars are a main focus area of companies to get advantages on Artificial Intelligence algorithms to their cars or other means of transport.

Nvidia, Google, Tesla and most of the automobile industries are investing in this new advances and creating autonomous prototypes that can reduce the amount of accidents and improve the comfort of the passengers and the conductor.

Self driving cars also should predict the behaviors of the human driving cars because they will be in the same environment.

A self driving car will need to have sophisticated systems in different areas.

Firstly, in the hardware part, cars should have sophisticated sensors and cameras to get the different agents that surround the car. These agents can be other cars, traffic lights, road lines…

This sensors can help the car to understand the environment and represent it in a 3D Computational space to understand better how it should move.

The car includes LiDAR sensors for raycasting distances at light speed, different HD cameras to detect objects, a 360 degress camera, a radar for long distances and a GPS for getting the current position and map data.

Also they will need to have a powerful CPU where all the algorithms are executed with a high speed and accuracy to have instant responses and prevent accidents.

This is the most complex part for self driving cars. With so many data obtained by the sensors, the CPU should manage it in an optimal way and give the car some outputs like the steering and acceleration of the car.

Now we will subdivide the software in different models and technologies

Neural Networks are the main tool for Artificial Intelligence. Because the amount of data the car must handle the neural networks tend to be insane and with different architectures. For object detection and line detection a Convolutional Neural Network must be implemented. It is widely used for image classification and it can reach a high accuracy. Also a car will have a Recurrent Neural Network to make better predictions . All of this models are based on the Feed-Forward Neural Network.

The entire network will be a complex Modular Network. This type of architecture wil have more simple Neural Networks with different layers: convolutional layers, pooling layers, fully connected layers…

They will also have sophisticated optimization algorithms to get better results.

The inputs of the Neural Network are all the images, distances, positions given by the sensors and the outputs all the possible movements and behaviors the car could do.

One important part of self driving cars is the computer vision. The are libraries for this specific tasks such as OpenCV. The car should detect everything around him and must represent it in a 3D Environment where the car is more comfortable.

In conclusion a lot of work must be done to get a complete autonomous car. A lot of different situations must be handled by the car and also the car should respond to imprediptible situations. Nowadayrs, comercial cars can drive in an autonomous way when the conditions are favorable but the conductor should be careful and be prepared to take the steering wheel.

You can create your own Self Driving 3D simulation in this tutorial series with Unity 3D.

]]>The simulation of a finite Universe built in Unity 3D can be watched here:

As the Universe is infinite and the distances are insane I have made the project in a specific space where every object is randomly generated. Also to make the simulation more interesting I have implemented some technological inventions that are currently being researched but they haven’t been used yet.

The project can be subdivided in 3 main areas:

- Random Universe generation: create different systems and environments
- Orbiting: planet, satellites orbiting around a star or a planet.
- Rotation: planets and other entities must move depending on the planet position.
- Space Elevator: technological invention to lift out of the atmosphere of a planet different objects fast

Firstly the simulation must generate different stars in the space that have minimum distance to each other. This start could be of different colours depending on their age.

Secondly an specific random amount of planets will be created with different textures, positions and radius to the respective star.

Finally, satellites and the space elevator will be initialized. Satellites will orbit around each planet with different 3D orbits.

In this images you will se how do satellites orbit around planets with different trayectories and planets around stars.

Planets also rotate around their own axis that will have a desviation with respect to the XYZ axis.

Finally there is implemented the space elevator to every planet. This lift rotates in a synchronous way with the planet and goes from the surface of the Earth to an specific height.

Thanks to the centripetal force of the Earth is it possible to create this incredible invention but the tension is also very high. The unique solution is to have a material that can support this tensions such as the Carbon Nanotubes. This structure is the best candidate for the cable of the space elevator

]]>Currently artificial intelligence is being used in major companies such as Amazon or Google and will soon be used to extend to our personal environment. Scientists from all over the world try to understand the functioning of our prodigious brains. Artificial intelligence is not easy to understand, and teaching people to use it is not a simple task. Therefore, in this project I have proposed to develop a simulator and at the same time editor of artificial intelligence with a graphical interface that facilitates to develop, to understand and to innovate in this technological sector.

**Keywords:**

Artificial intelligence, future, algorithms, neural network, autonomous learning, programming, AI, machine learning, deep learning, education

Loading…

In this tutorials you will learn how to create a self driving simulator where the cars learn to drive using Darwin’s law of genetic

The project is created in Unity 3D using C# as programming language. You will learn how to design the map in Unity and the algorithms we implement. Also you will be able to program this algorithms in C#

The project can be divided in different aspects we will must focus on.

- Neural Network: we will be creating a multilayer neural network
- Genetic Algorithm with its respectives crossover and mutation functions
- Visual simulator in 3D with Unity: create your track
- Lasers: input data of the neural network
- Car movement: output actions of the network
- Camera: viewing the simulation in an intelligent way
- Controller: handles all the executions and the algorithms

This video explains how all the components work. There are explained the Neural Network, Genetic Algorithm, Camera and Cars.

I explain the neccessary maths and algorithms for the simulator

Neural Networks are computing systems that are used to relate different stimulus with different actions or possible solutions. A neural network is organized in layers and each layer will have a certain amount of neurons. This neurons will be connected to others with weights. Every neuron and weights will store a value. The weight values will be static when the network is being updated in our project. However, the neurons values will be changing during the simulator because they will depend on the input layer and the weights of the network.

In out project each car will have its own neural network and it will be updated every time. This updated algorithm is called feed-forward. The cars will have as the input layer the distances to the walls that surround the cars. With this inputs and the weights we will get some ouputs that would be the rotation and the acceleration.

We will need to put this into practice and create a class in C# called Neural Network that will contain as variables the List of float[][] of weights and the List of neurons organized in layers. It will have two contructors. One for the first generation of cars that will be random weights and the second for assigning the DNA to the weights. It will also have the feed-forward algorithm implemented. You can learn how to program the Neural Network in the next tutorial:

For a deep explanation of neural networks see Neural Networks and Feed-Forward Neural Networks

**GENETIC ALGORITHM**

Instead of using backpropagation for the Neural Network to learn, we will be using reinforcement learning. We will use the Genetic Algorithm applied to every cars. As the Darwin’s law says the entity (car) with the best genes will survive and it will propagate the genes to the next generation. They will evolve in a better way and survive more time.

If we want to express this matematically for out cars we should then start by creating for each car is own list of genes. There will be as much genes as the quantity of weights in the neural network of each car. When we create the array we will start by creating it in a random way in the first generation.

We can divide the Genetic Algorithm in different sections:

- Random Initialization of DNAs
- Selection of best cars depending on the fitness function
- Crossover of the genes of the parents
- Mutation the new DNA

**INITIALIZATION**

The initialization of the DNAs in a random way will be executed only in the first generation when the Neural Networks of the cars are created. We will create the DNA as a List of length the amount of weights in Neural Network. We will create genes in the interval of [-1,1]

We will create different random DNAs for each car. Then we will assign this DNA to the weights of the Neural Networks of the cars.

**SELECTION**

When the simulator finished because all the cars have collided with a wall, we should choose which have the best genes. You can do this in a lot of different ways. Depending on the distance traveled, the average speed or the time driving. I choosed for this project the time of the car driving, so we will ge the genes for the first and second car that survive more time in the track.

All this factors could be joined to obtain a better selection of parents. It can depend also on the diversity we want to achieve.

We could create a function that depends on the factors mentioned before in order to get a specific score for each car called **fitness**

**CROSSOVER**

With the parents selected we will swap the genes to create a new DNA with a mix of both parents prefering random fragments from one parent and the other. This random value could be modificated depending if the parents have very different fitness or very similar. As they a similar fitness score they will have more similar probability to be chosen the genes.

You can see here and example of the genes being manipulated:

**MUTATION**

When we create for each new car for the new generation the mixture between both parents we will create different mutations. This is used because the parents are the best from the generation but not the best driving cars. We must create small mutations in the DNA of each new car to get better cars.

Fo more information about genetic algorithms see Genetic Algorithms

The tutorial of how to program the genetic algorithm and implement the genes in code is the next video:

Additionaly from the algorithms, we should create all the mesh from the cars, terrain, effects, track and camera.

Each car must be able to move and rotate. Each car will have its own lasers that raycast the distances to the track walls and output a value from 0 to 1.

The camera will behave as a drone that follows one car and will have a smooth movement when changing to other car. It will rotate the cars it is looking to

The terrain, track and effects can be choose by the creator. Nevertheless I have created a video explaining one simple terrain with a track and water:

The neural networks must have some input data to work and to handle the car movement and rotation. In this case we will store the distances of different rays that raycast to the walls of the track. It will have some parameters such us the angle of view and the number of laser sensors.

The outputs of the neural network will be the rotation and acceleration. We will need to update the car position and rotation depending on this values and the time passed in simulation. The car will have an Accelerated Linear Uniform Movement.

The camera will automatically move from one car to another. Depending on the cars that haven’t collided yet the camera will follow one of them. When a car collides and the camera is following it, it will change to another car in a fluent way. Also the camera will rotate slowly around the car.

We will finally need to join and interact with all the stuff created before. We will have a class called Controller that will be in charge of controlling every object and algorithm in the simulator.

The lasers will be the inputs of the Neural Network and will be assigned to the input layer of each car with its own lasers. Then we will need to apply the feed-forward algorithm and get the outputs of the Neural Network. These outputs will be the rotation and acceleration of each car that will be used in the physics of the car. This sequence will be updated every time for every car.

If it’s the first generation the DNA of every car will be random. In the next generations the weights will be adjusted by the genetic algorithm.

The next video will be the Car Controller that is the final video where you will be finished all the code and fix some bugs for the project to work. I will also explain how the parameters work and the Car Controller will be coded.

If you like the projects you can give a comment and if you have some questions tell it to me.

The code is available here: Github Repository

]]>