Jelenlegi hely

Neural Networks

Neural Networks III: How would I implement one?

Fórum:

Címkék:

rwurl=https://imgur.com/FC1QvBY

In the third article in the series, I am attempting to keep everything fairly detailed and explain everything I do, as I dive deeply into actual implementation of the Feed-Forward Neural Network. To make the implementation process easier to comprehend, the article is divided into 5 sub-segments:

- Making a simple, Feed-Forward Neural Network structure
- Fixing the Neural Network’s bias/weight initial values
- Adding a learning algorithm to the Neural Network
- Multiple Input and Output Sets for our Neural Network
- Training for handwriting recognition with MNIST data set

Let’s jump right in!

I will use Java in this case, but any other programming language will follow the exact same route of ideas. I’m not going to use any exotic language specific solution, but will try to keep everything as generic as possible.

Making a simple, Feed-Forward Neural Network structure:

The structure of our entire Neural Network supposed to be very simple, as the theoretical example was in the previous article. I presume the entire codebase should cap around 150-200 lines of code, plus the helper utility classes.

First, let’s create a Network.java class, which represent our NN object.

We would need a few integer constant attributes here, which defines our network and doesn’t need to change over the program’s lifetime.

NETWORK_LAYER_SIZES” Contains the number or neurons in each of our layer.

NETWORK_SIZE” Contains the number of layers over our NN. We set this number to be associated from the NETWORK_LAYER_SIZES array’s length.

INPUT_SIZE” Contains the number of input neurons. Input layer is the first in the network, so the first number in the NETWORK_LAYER_SIZES will represent this variable.

OUTPUT_SIZE” Contains the number of output neurons in our NN. Output is the last layer, so it is represented by the last index in the NETWORK_LAYER_SIZES array.

Now we will declare a couple of variables to work with:

output” Contains the calculated output value of every neuron over our entire network. This value needs to be as precise as possible to give us accurate results, so we use Double as a datatype. Using a two dimensional array here is sufficient enough to store the given layer, and the given neuron positions as well.

weights” Stores all the weight data over the network. Note that this needs to be a three dimensional array to store all the necessary positions. The first value would be the given layer, the second is the given neuron, and the third is the previous neuron the weight is connected to. We need this previous neuron data because as we have learned in the previous article, a single neuron on the given layer is connected to all of the previous ones at the adjacent previous layer.

bias” Is a two dimensional array, similar as the output, because every neuron has one bias variable as well.

public class Network { public final int[] NETWORK_LAYER_SIZES; public final int NETWORK_SIZE; public final int INPUT_SIZE; public final int OUTPUT_SIZE; private double[][] output; //layer, neuron private double[][][] weights; //layer, neuron, previousNeuron private double[][] bias; //layer, neuron }

Our constructor will receive the NETWORK_LAYER_SIZES value from the initialization method, and all the rest of the constant and variable data can be calculated from it.

For initializing the output, weight and bias values, we assign the NETWORK_SIZE as the first dimension’s size. Also we need to iterate through a FOR loop to initialize all the rest of the elements over the second layer. Note that while every neuron has an output and a bias, the very first layer doesn’t have weights on it (being the input layer), so we start assigning weight values from the second layer in this loop.

public Network(int[] NETWORK_LAYER_SIZES) { this.NETWORK_LAYER_SIZES = NETWORK_LAYER_SIZES; this.NETWORK_SIZE = NETWORK_LAYER_SIZES.length; this.INPUT_SIZE = NETWORK_LAYER_SIZES[0]; this.OUTPUT_SIZE = NETWORK_LAYER_SIZES[NETWORK_SIZE - 1]; this.output = new double[NETWORK_SIZE][]; this.weights = new double[NETWORK_SIZE][][]; this.bias = new double[NETWORK_SIZE][]; for (int i = 0; i < NETWORK_SIZE; i++) { this.output[i] = new double[NETWORK_LAYER_SIZES[i]]; this.bias[i] = new double[NETWORK_LAYER_SIZES[i]]; if (i > 0) { weights[i] = new double[NETWORK_LAYER_SIZES[i]][NETWORK_LAYER_SIZES[i - 1]]; } } }

We now have a basic initialization constructor, but we need to have a method that will calculate the FEED-FORWARDING values as well. Let’s call it “calculate”. This methods takes an array of doubles as an input parameter, and returns an array of doubles as an output.

public double[] calculate(double input[]) { if (input.length != this.INPUT_SIZE) { return null; } this.output[0] = input; for (int layer = 1; layer < NETWORK_SIZE; layer++) { for (int neuron = 0; neuron < NETWORK_LAYER_SIZES[layer]; neuron++) { double sum = bias[layer][neuron]; for (int prevNeuron = 0; prevNeuron < NETWORK_LAYER_SIZES[layer - 1]; prevNeuron++) { sum += output[layer - 1][prevNeuron] * weights[layer][neuron][prevNeuron]; } output[layer][neuron] = sigmoid(sum); } } return output[NETWORK_SIZE - 1]; }

The very first IF block check just makes sure that the input array’s size matches our network’s previously set INPUT_SIZE constant value. If it doesn’t we cannot do any calculations.
The next line just passes these input values to the output array’s first element, since it doesn’t need to do any calculations with it.
After that, we have a nested FOR loop that iterates through all the rest of the layers, while iterating through all the neurons as well in the given layer. Here, each of the neurons will apply the summarization with the bias, apply the weight multiplication over each of the previous neurons, iterating through yet another FOR loop. Finally we apply the sigmoid function to this summarized value.

The math for the sigmoid function can be represented by this in Java:

private static double sigmoid(double x) { return 1d / (1 + Math.exp( - x)); }

We can quickly make a Main method to test our current version of the network with some random values. Let’s instantiate our network and have the input and output layers contain 5-5 neurons, and have two hidden layers, containing 4 and 3 neurons. We can feed some random values as inputs like 0.2, 0.3, 0.1, 0.2, 0.5. Java has a good amount of integrated helper methods to make our life easier as a programmers, and the “Arrays.toString” can print out all the values on the given array in a nicely formatted, coma separated list.

public static void main(String[] args) { Network net = new Network(new int[]{5,4,3,5}); double[] output = net.calculate(new double[]{0.2,0.3,0.1,0.2,0.5}); System.out.println(Arrays.toString(output)); }

When we run this program, we notice an issue right away. No matter what input values are we entering, all five of the output values are always exactly 0.5. This occurs because all the weight and bias values are initialized as 0 instead of 1, which basically nullifies all of our calculated summary values, causing the sigmoid function to return us 0.5 every time as well.

Fixing the Neural Network’s bias/weight initial values:

We can change the weight/bias initialization lines at our constructor to start with 1, but I will go one step further and make those values randomized within a certain range. This will give us more flexible control over the network’s behavior right from the start.

I’m creating a helper class called “NetworkTools” to store all the array creating randomizer and related utility methods. These methods are going to become handy as we go on, and are very straight forward to understand. I’ve commented their functions at each of their header:

public class NetworkTools { //every value in this generated array will be the init_value public static double[] createArray(int size, double init_value) { if (size &lt; 1) { return null; } double[] ar = new double[size]; for (int i = 0; i &lt; size; i++) { ar[i] = init_value; } return ar; } //every value in this generated 1 dimensional array will be random number, within a lower and upper bound public static double[] createRandomArray(int size, double lower_bound, double upper_bound) { if (size &lt; 1) { return null; } double[] ar = new double[size]; for (int i = 0; i &lt; size; i++) { ar[i] = randomValue(lower_bound, upper_bound); } return ar; } //every value in this generated 2 dimensional array will be random number, within a lower and upper bound public static double[][] createRandomArray(int sizeX, int sizeY, double lower_bound, double upper_bound) { if (sizeX &lt; 1 || sizeY &lt; 1) { return null; } double[][] ar = new double[sizeX][sizeY]; for (int i = 0; i &lt; sizeX; i++) { ar[i] = createRandomArray(sizeY, lower_bound, upper_bound); } return ar; } //returns a random double within the desired lower and upper bound public static double randomValue(double lower_bound, double upper_bound) { return Math.random() * (upper_bound - lower_bound) + lower_bound; } //returns a specific amount of random unique (cannot appear more than once) integers, from the desired lower and upper bound public static Integer[] randomValues(int lowerBound, int upperBound, int amount) { lowerBound--; if (amount &gt; (upperBound - lowerBound)) { return null; } Integer[] values = new Integer[amount]; for (int i = 0; i &lt; amount; i++) { int n = (int)(Math.random() * (upperBound - lowerBound + 1) + lowerBound); while (containsValue(values, n)) { n = (int)(Math.random() * (upperBound - lowerBound + 1) + lowerBound); } values[i] = n; } return values; } //receives any datatype array as a first parameter, and checks if the provided second parameter value is contained in it public static &lt; T extends Comparable &lt; T &gt;&gt; boolean containsValue(T[] ar, T value) { for (int i = 0; i &lt; ar.length; i++) { if (ar[i] != null) { if (value.compareTo(ar[i]) == 0) { return true; } } } return false; } //returns the highest value's index within the provided double array. public static int indexOfHighestValue(double[] values) { int index = 0; for (int i = 1; i &lt; values.length; i++) { if (values[i] &gt; values[index]) { index = i; } } return index; } }

So going back to the Network constructor and changing the weight and bias initialization lines to produce some random values. These values absolutely doesn’t matter right now, they can be positive or negative also:

public Network(int[] NETWORK_LAYER_SIZES) {… for (int i = 0; i < NETWORK_SIZE; i++) {… this.bias[i] = NetworkTools.createRandomArray(NETWORK_LAYER_SIZES[i], -0.5, 0.7); if (i > 0) { weights[i] = NetworkTools.createRandomArray(NETWORK_LAYER_SIZES[i], NETWORK_LAYER_SIZES[i - 1], -1, 1); } } }

Now every time we run the program and make one pass of feed-forwarding calculations, it will give us random output values, proving that the network works as intended. Without a learning algorithm however, the network is fairly useless in this state, so let’s tackle that issue as well.

Adding a learning algorithm to the Neural Network:

So for every given input value combination, we would need to have a “targeted” output value combination as well. With these values we can measure and compare how far or close the network is from the currently calculated output values. Let’s say the previously declared input values (0.2,0.3,0.1,0.2,0.5) we want the network to output ideally (1,0,0,0,0) instead of any other seemingly random numbers.

As explained in the previous article, this is where Backpropagation, our learning mechanism comes in and trying to predict each of the previous adjacent layers supposed weight and bias values, which would produce this final desired output. It will start from the last layer and try to predict what weight and bias combination could produce a value closer to the desired output results, once that is figured out, will jump to the previous adjacent layer and do these modification again and so on until it gets to the starting layer. We start from the last layer because those related weight and bias values have the greatest influence over the final output. Note that by the rules of the network, we can only change these bias and weight values if we want to influence these output values, we cannot change any other values directly.

These differences over the desired output and current output are called the “error signal”. Before making any changes to the weight/bias values, we finish the backpropagation completely and store this error signal from each layer (except the very first input layer).

Intuitively this seems like a very easy task to do. Just subtracting the targeted value from the current value, try nudging the weight/biases over some positive values and measure if we got closer or further from the desired output. If we got closer, then try adding more positive values until we reach the desired output, but if we got further away, start applying negative values instead and keep doing so. This would be exactly true for a simple input range, representing a straightforward curve:

rwurl=https://imgur.com/W77WOyG

But unfortunately, as we get more and more input values, the complexity of the whole Neural Network function gets significantly more complex as well, and predicting the exact “right” position and the path towards it becomes less and less obvious:

rwurl=https://imgur.com/3LgjejG

As you can see, having multiple local minimums can easily “fool” the algorithm, thinking that it goes the right way, but in reality it may just pursuit some local ones, which never going to produce the desired output. You can think of the algorithm as a “heavy ball” for weights. This ball will run down the slope where it started from, and stop at the bottom that it happen to find. Now for instance, if the initial weight value would be between 0 and 0.5 somewhere, and no matter where it would start to adjust, with a naïve “heavy ball” approach, it would slide down to be around ~0.65 and stop there, while we can clearly see that that would always produce a wrong result. This is the primary reason we use randomized values each time we start the training process, instead of setting them to 1, so every time there would be a new chance for these weights to propagate over the proper global minimum values.

Furthermore, to successful tackle this backpropagation, each neuron needs to have an “error signal” and an “output derivative” values, beside their regular output values. Our backpropagation error function, which calculates how close we are from the target output values looks like this:
E = ½ (target-output)^2.

I am not going to go into the details of all the related math in this article, because it’s a fairly large subject of its own. But for anyone interested, I can refer you to Ryan Harris over Youtube. He has excellent tutorial series for backpropagation algorithms, just to help you comprehend the whole concept easier:

rwurl=https://www.youtube.com/watch?v=aVId8KMsdUU

There are many good written articles over the net for it, Wikipedia is a great detailed source as well:
https://en.wikipedia.org/wiki/Backpropagation

Alright, going back to coding. We need to declare these two additional variables and initialize them at the constructor before we can use them:

public class Network { private double[][] errorSignal; private double[][] outputDerivative; } public Network(int[] NETWORK_LAYER_SIZES) { … this.errorSignal = new double[NETWORK_SIZE][]; this.outputDerivative = new double[NETWORK_SIZE][]; for (int i = 0; i < NETWORK_SIZE; i++) { … this.errorSignal[i] = new double[NETWORK_LAYER_SIZES[i]]; this.outputDerivative[i] = new double[NETWORK_LAYER_SIZES[i]]; … } }

The feed-forwarding calculation needs to be updated with these variables as well:

public double[] calculate(double input[]) { … for (int layer = 1; layer < NETWORK_SIZE; layer++) { for (int neuron = 0; neuron < NETWORK_LAYER_SIZES[layer]; neuron++) { … for (int prevNeuron = 0; prevNeuron < NETWORK_LAYER_SIZES[layer - 1]; prevNeuron++) { … } output[layer][neuron] = sigmoid(sum); outputDerivative[layer][neuron] = output[layer][neuron] * (1 - output[layer][neuron]); } } … }

Calculating the error needs another method, we will call this “backpropError”, which receives the target output array and does the error calculations for each layer, starting from the last one:

public void backpropError(double[] target) { for (int neuron = 0; neuron < NETWORK_LAYER_SIZES[NETWORK_SIZE - 1]; neuron++) { //for output layer neurons errorSignal[NETWORK_SIZE - 1][neuron] = (output[NETWORK_SIZE - 1][neuron] - target[neuron]) * outputDerivative[NETWORK_SIZE - 1][neuron]; } for (int layer = NETWORK_SIZE - 2; layer > 0; layer--) { //for hidden layer neurons for (int neuron = 0; neuron < NETWORK_LAYER_SIZES[layer]; neuron++) { double sum = 0; for (int nextNeuron = 0; nextNeuron < NETWORK_LAYER_SIZES[layer + 1]; nextNeuron++) { sum += weights[layer + 1][nextNeuron][neuron] * errorSignal[layer + 1][nextNeuron]; } this.errorSignal[layer][neuron] = sum * outputDerivative[layer][neuron]; } } }

Once we have these values, we can finally update the weights and biases over our network. We will need another method for this. Let’s call this “updateWeightsAndBiases”. It can receive 1 parameter called the “learning rate”. The learning rate is just a ratio value, indication how brave should the learning algorithm be over nudging those values in a positive or negative values. Setting this number to too small will produce a much slower learning periods, but setting it to too high may produce errors or anomalies over the calculations, making the whole learning process taking slower again.

public void updateWeightsAndBiases(double learningRate) { for (int layer = 1; layer < NETWORK_SIZE; layer++) { for (int neuron = 0; neuron < NETWORK_LAYER_SIZES[layer]; neuron++) { //for bias double delta = -learningRate * errorSignal[layer][neuron]; bias[layer][neuron] += delta; //for weights for (int prevNeuron = 0; prevNeuron < NETWORK_LAYER_SIZES[layer - 1]; prevNeuron++) { //weights[layer, neuron, prevNeuron] weights[layer][neuron][prevNeuron] += delta * output[layer - 1][prevNeuron]; } } } }

Next, let’s have a method that will make our live easier and connect all these learning functionalities together. Let’s call it “train”. It receives the input array, the output array and a learning rate. It goes over all these previously mentioned calculatios:

public void train(double[] input, double[] target, double learningRate) { if (input.length != INPUT_SIZE || target.length != OUTPUT_SIZE) { return; } calculate(input); backpropError(target); updateWeightsAndBiases(learningRate); }

We are now set to use our learning algorithm! Let’s change the main method to do so. We can use the same similar network setup, having for instance the input layer as 3 neurons values: 0.1, 0.5, 0.2, while having the output layer of 5 neurons, expecting the output to be for this given input combination to be 0, 1, 0, 0, 0. The FOR loop represents the number of times we are running the learning algorithm and applying the changes to the weights/biases.

public static void main(String[] args) { Network net = new Network(new int[]{3,4,3,5}); double[] input = new double[]{0.1, 0.5, 0.2}; double[] expected = new double[]{0, 1, 0, 0, 0}; for (int i = 0; i &lt; 1; i++) { net.train(input, expected, 1); //input, target, learningRate } double[] output = net.calculate(input); System.out.print(" current output neuron values: "); for (double neuronValue: output) { System.out.printf("%02.3f ", neuronValue); } System.out.printf("\n expected output neuron values: "); for (double neuronValue: expected) { System.out.printf("%02.3f ", neuronValue); } }

Running the program gives us a seemingly far away result from desired:

output: current output neuron values: 0.744 0.667 0.566 0.576 0.388 expected output neuron values: 0.000 1.000 0.000 0.000 0.000

This is reasonable again, we ran the learning algorithm only once, and as we know already, it’s virtually impossible to “guess” the right weight/bias combination. The network needs many tries and measuring over and over, until it can get closer and closer. Let’s try running the learning algorithm 10 times for instance by changing the FOR loop value:

output: current output neuron values: 0.143 0.791 0.226 0.224 0.242 expected output neuron values: 0.000 1.000 0.000 0.000 0.000

We can see that the output this time is getting actually closer to the desired values. The ones that should be zero are ~0.2, and the one that should be 1, is almost ~0.8. Ok, let’s try running the learning algorithm, say 10,000 times:

output: current output neuron values: 0.004 0.996 0.004 0.004 0.004 expected output neuron values: 0.000 1.000 0.000 0.000 0.000

We can see that the values are getting really close to the desired ones, and the more and more we train the network, the more accurate will it actually be. It depends on us how close do we want to get to the desired values before we can safely say that the network knows the right output for the right input, and how much processing power do we want to trade in for the training. You can imagine that over a large network and large amount of input data, a couple of million iterations can take hours or even days.

Multiple Input and Output Sets for our Neural Network:

In most of the cases, we would have a large amount of different input sets and all of them need to produce a given targeted output sets. We could have different variable names for each selected input and desired output arrays, but you can imagine that this would start to get tedious even for a couple of hundred values, not talking about millions.

To tackle this issue, we need to be as efficient as possible and create a new class, which can contain and work with many-many input and their corresponding expected output values. Let’s call it “TrainSet”. I’m briefly going the talk about a few methods in it, because most of them are straight forward to understand just by looking at them.

So we have a constructor that can accept the input and output size, this will represent the number of neurons at the network’s input and output layer.

“addData(input[], expected[])” will expect two parameters, the first one being the currently inserted input array values and the second being currently expected output array values for it. You can call this method as many time you need, and add as many input/expected array combinations to the set, for instance with a simple FOR loop.

“getInput(index)” and “getOutput(index)” will get you back these input/expected array values from the given index point.

“extractBatch” Gives us the ability to extract only a given range of the preloaded set, instead of the all. This can be handy for instance if we have 7000 entries in the set, but we would like to work with only 20 for a given task.

The main method just generates some random input/expected values, stores them in the set with the help of the FOR loop, and outputs them in the end as a demonstration.

public class TrainSet { public final int INPUT_SIZE; public final int OUTPUT_SIZE; //double[][] &lt;- index1: 0 = input, 1 = output || index2: index of element private ArrayList &lt; double[][] &gt; data = new ArrayList &lt; &gt;(); public TrainSet(int INPUT_SIZE, int OUTPUT_SIZE) { this.INPUT_SIZE = INPUT_SIZE; this.OUTPUT_SIZE = OUTPUT_SIZE; } //adds new data to the data set public void addData(double[] in , double[] expected) { if ( in .length != INPUT_SIZE || expected.length != OUTPUT_SIZE) return; data.add(new double[][] { in , expected }); } public TrainSet extractBatch(int size) { if (size &gt; 0 &amp;&amp; size &lt;= this.size()) { TrainSet set = new TrainSet(INPUT_SIZE, OUTPUT_SIZE); Integer[] ids = NetworkTools.randomValues(0, this.size() - 1, size); for (Integer i: ids) { set.addData(this.getInput(i), this.getOutput(i)); } return set; } else return this; } public static void main(String[] args) { TrainSet set = new TrainSet(3, 2); for (int i = 0; i &lt; 8; i++) { double[] a = new double[3]; double[] b = new double[2]; for (int k = 0; k &lt; 3; k++) { a[k] = (double)((int)(Math.random() * 10)) / (double) 10; if (k &lt; 2) { b[k] = (double)((int)(Math.random() * 10)) / (double) 10; } } set.addData(a, b); } System.out.println(set); } public String toString() { String s = "TrainSet [" + INPUT_SIZE + " ; " + OUTPUT_SIZE + "]\n"; int index = 0; for (double[][] r: data) { s += index + ": " + Arrays.toString(r[0]) + " &gt;-||-&lt; " + Arrays.toString(r[1]) + "\n"; index++; } return s; } //how many data sets we got public int size() { return data.size(); } //gets the input set from a certain index on the data set public double[] getInput(int index) { if (index &gt;= 0 &amp;&amp; index &lt; size()) return data.get(index)[0]; else return null; } //gets the output set from a certain index on the data set public double[] getOutput(int index) { if (index &gt;= 0 &amp;&amp; index &lt; size()) return data.get(index)[1]; else return null; } public int getINPUT_SIZE() { return INPUT_SIZE; } public int getOUTPUT_SIZE() { return OUTPUT_SIZE; } }

Going back to our Network class, let’s create a method called “trainWithSet”. This method will accept a whole trainset to work with, a number of training loops we would like to go through the whole set, and the batchSize we would like to work with:

public void trainWithSet(TrainSet set, int loops, int batchSize) { if (set.INPUT_SIZE != INPUT_SIZE || set.OUTPUT_SIZE != OUTPUT_SIZE) { return; } for (int i = 0; i < loops; i++) { TrainSet batch = set.extractBatch(batchSize); for (int b = 0; b < batchSize; b++) { this.train(batch.getInput(b), batch.getOutput(b), 0.3); } } }

We need a new main method to handle traning sets, let’s make one:

public static void main(String[] args) { Network net = new Network(new int[] { 5, 3, 3, 2 }); TrainSet set = new TrainSet(5, 2); set.addData(new double[]{0.1,0.2,0.3,0.4,0.5}, new double[]{0.9,0.1}); set.addData(new double[]{0.9,0.8,0.7,0.6,0.2}, new double[]{0.1,0.9}); set.addData(new double[]{0.3,0.8,0.7,0.4,0.1}, new double[]{0.3,0.7}); set.addData(new double[]{0.9,0.3,0.4,0.5,0.6}, new double[]{0.7,0.3}); set.addData(new double[]{0.2,0.9,0.4,0.2,0.4}, new double[]{0.2,0.4}); set.addData(new double[]{0.1,0.1,0.9,0.9,0.9}, new double[]{0.5,0.5}); net.trainWithSet(set, 1, 6); for (int i = 0; i < 6; i++) { System.out.println(Arrays.toString(net.calculate(set.getInput(i)))); } }

I’ve made a network with 5 neurons at the input layer, 2 at the output and 3 at both of the hidden layers. For this example this will be sufficient, but this is the time when we need to think of the hidden layer’s size. If we define the number of neurons too small here, the network won’t have enough space to “store” very large number of data combinations, because the new input values that would set the weights and biases, can override the already properly defined ones, resulting in never-ending try and error iterations, that will never produce accurate result for all the desired values.

On the other hand, having too large network size will make the network extremely slow to work with, and making it significantly slower to learn as well.

So we instantiated a new trainset in the main method, having the same number of input and output neuron numbers as our network does. We added 6 data sets, each containing the input set, and the expected output set for it.

Finally, we called each input set entry values (in our example, that’s 6 entries to loop through), and verify out network if it produces the expected output values, after the training.

If we run our program, we can see that the values are all random numbers all over the place:

output: [0.5804082848557195, 0.611540224159821] [0.5793365763365534, 0.6152768910621293] [0.5792965979397438, 0.6135348171779963] [0.5798916531162135, 0.6149473277301244] [0.5799083251987178, 0.6139233850164606] [0.5825736971539314, 0.6127042166996858]

This is fully expected now that we know how the training process works. Let’s notch up the training to run 1000 times:

output: [0.8269183006326297, 0.1488344744465759] [0.13914142176572553, 0.7727892885721617] [0.16591764093195452, 0.7372080691272994] [0.6711814111500592, 0.27195100535567884] [0.3606707137762615, 0.5284685374836402] [0.4670048361628152, 0.43512742680336075]

We can see that the numbers are converging closer and closer to the expected output values, the more and more training do we make before testing. This is fully expected again. Let’s try 100,000 training iterations anyway:

output: [0.9000000328097575, 0.10000019245068295] [0.10000119473600338, 0.9000012484104274] [0.29999933445396226, 0.6999994676152578] [0.6999999906432556, 0.29999997748811974] [0.2000002150797167, 0.39999998997508834] [0.5000000439132515, 0.500000037083972]

Yep, as expected, all the tested output values are getting extremely close to the expected values, after this many training iterations.

You can see where we are going with this. Yes, again referring back to the previous article, we can use this to train the network with large number of written single digit numbers, to “guess” our uniquely handwritten sample that we will provide to it.

Finally, training for handwriting recognition with MNIST data set:

MNIST is a large, open and free database of handwritten digit values, and their supposed output labels. It has a training set over 60,000 examples and test set of 10,000 examples:
http://yann.lecun.com/exdb/mnist/

Let’s download the training set of images and training set of labels and store them in /res folder. We can make another file here, called number.png, a 28*28pixel large file that will eventually contain our personally handwritten testable image.

We will make several classes to work with the MNIST dataset values and connect them with our network. First the “MnistDbFile.java” to help us work with the database files:

import java.io.FileNotFoundException; import java.io.IOException; import java.io.RandomAccessFile; /** * MNIST database file containing entries that can represent image or label * data. Extends the standard random access file with methods for navigating * over the entries. The file format is basically idx with specific header * information. This includes a magic number for determining the type of stored * entries, count of entries. */ public abstract class MnistDbFile extends RandomAccessFile { private int count; /** * Creates new instance and reads the header information. * * @param name * the system-dependent filename * @param mode * the access mode * @throws IOException * @throws FileNotFoundException * @see RandomAccessFile */ public MnistDbFile(String name, String mode) throws IOException { super(name, mode); if (getMagicNumber() != readInt()) { throw new RuntimeException("This MNIST DB file " + name + " should start with the number " + getMagicNumber() + "."); } count = readInt(); } /** * MNIST DB files start with unique integer number. * * @return integer number that should be found in the beginning of the file. */ protected abstract int getMagicNumber(); /** * The current entry index. * * @return long * @throws IOException */ public long getCurrentIndex() throws IOException { return (getFilePointer() - getHeaderSize()) / getEntryLength() + 1; } /** * Set the required current entry index. * * @param curr * the entry index */ public void setCurrentIndex(long curr) { try { if (curr &lt; 0 || curr &gt; count) { throw new RuntimeException(curr + " is not in the range 0 to " + count); } seek(getHeaderSize() + (curr - 1) * getEntryLength()); } catch(IOException e) { throw new RuntimeException(e); } } public int getHeaderSize() { return 8; // two integers } /** * Number of bytes for each entry. * Defaults to 1. * * @return int */ public int getEntryLength() { return 1; } /** * Move to the next entry. * * @throws IOException */ public void next() throws IOException { if (getCurrentIndex() &lt; count) { skipBytes(getEntryLength()); } } /** * Move to the previous entry. * * @throws IOException */ public void prev() throws IOException { if (getCurrentIndex() &gt; 0) { seek(getFilePointer() - getEntryLength()); } } public int getCount() { return count; } }

Next is the “MnistImageFile.java” to work with the images in the database:

import java.io.FileNotFoundException; import java.io.IOException; /** * * MNIST database image file. Contains additional header information for the * number of rows and columns per each entry. * */ public class MnistImageFile extends MnistDbFile { private int rows; private int cols; /** * Creates new MNIST database image file ready for reading. * * @param name * the system-dependent filename * @param mode * the access mode * @throws IOException * @throws FileNotFoundException */ public MnistImageFile(String name, String mode) throws FileNotFoundException, IOException { super(name, mode); // read header information rows = readInt(); cols = readInt(); } /** * Reads the image at the current position. * * @return matrix representing the image * @throws IOException */ public int[][] readImage() throws IOException { int[][] dat = new int[getRows()][getCols()]; for (int i = 0; i < getCols(); i++) { for (int j = 0; j < getRows(); j++) { dat[i][j] = readUnsignedByte(); } } return dat; } /** * Move the cursor to the next image. * * @throws IOException */ public void nextImage() throws IOException { super.next(); } /** * Move the cursor to the previous image. * * @throws IOException */ public void prevImage() throws IOException { super.prev(); } @Override protected int getMagicNumber() { return 2051; } /** * Number of rows per image. * * @return int */ public int getRows() { return rows; } /** * Number of columns per image. * * @return int */ public int getCols() { return cols; } @Override public int getEntryLength() { return cols * rows; } @Override public int getHeaderSize() { return super.getHeaderSize() + 8; // to more integers - rows and columns } }

Next is the “MnistLabelFile.java” to help us work with the labels over the database:

import java.io.FileNotFoundException; import java.io.IOException; /** * * MNIST database label file. * */ public class MnistLabelFile extends MnistDbFile { /** * Creates new MNIST database label file ready for reading. * * @param name * the system-dependent filename * @param mode * the access mode * @throws IOException * @throws FileNotFoundException */ public MnistLabelFile(String name, String mode) throws IOException { super(name, mode); } /** * Reads the integer at the current position. * * @return integer representing the label * @throws IOException */ public int readLabel() throws IOException { return readUnsignedByte(); } /** Read the specified number of labels from the current position*/ public int[] readLabels(int num) throws IOException { int[] out = new int[num]; for (int i = 0; i < num; i++) out[i] = readLabel(); return out; } @Override protected int getMagicNumber() { return 2049; } }

And finally, the “Mnist.java” file, that will contain our main method to run the training algorithms, connect them with the training sets, and finally test our handwritten number and try to guess its value:

import java.awt.Color; import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; import javax.imageio.ImageIO; public class Mnist { public static void main(String[] args) { Network network = new Network(new int[]{784, 70, 35, 10}); TrainSet set = createTrainSet(0, 5000); trainData(network, set, 10, 20, 5000); testMyImage(network); } public static void testMyImage(Network net) { BufferedImage img = null; try { img = ImageIO.read(new File("res/number.png")); } catch(IOException e) { e.printStackTrace(); } double[] input = new double[784]; for (int i = 0; i &lt; 28; i++) { for (int n = 0; n &lt; 28; n++) { input[n * 28 + i] = (float)(new Color(img.getRGB(i, n)).getRed()) / 256f; } } System.out.print("output neuron values: "); double[] output = net.calculate(input); for (double neuronValue: output) { System.out.printf("%02.3f ", neuronValue); } System.out.println(); System.out.print("corresponding number: 0 1 2 3 4 5 6 7 8 9"); System.out.println(); System.out.println("I think, that the handwritten number is: " + NetworkTools.indexOfHighestValue(output) + "!"); } public static TrainSet createTrainSet(int start, int end) { TrainSet set = new TrainSet(28 * 28, 10); try { String path = new File("").getAbsolutePath(); MnistImageFile m = new MnistImageFile(path + "/res/trainImage.idx3-ubyte", "rw"); MnistLabelFile l = new MnistLabelFile(path + "/res/trainLabel.idx1-ubyte", "rw"); for (int i = start; i &lt;= end; i++) { if (i % 100 == 0) { System.out.println("prepared: " + i); } double[] input = new double[28 * 28]; double[] output = new double[10]; output[l.readLabel()] = 1d; for (int j = 0; j &lt; 28 * 28; j++) { input[j] = (double) m.read() / (double) 256; //images are from 0-256 but we want 0-1 for our network to learn. } set.addData(input, output); m.next(); l.next(); } } catch(Exception e) { e.printStackTrace(); } return set; } public static void trainData(Network net, TrainSet set, int epochs, int loops, int batch_size) { for (int e = 0; e &lt; epochs; e++) { net.trainWithSet(set, loops, batch_size); System.out.println("&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; " + e + " &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;"); } } }

Let’s take a look at the main method in the “Mnist.java” class. The input neuron array size is 784, each one representing a single pixel value from a written number (from 28*28 images). An output neuron array size can be 10, each one representing a single digit number, from 0-9. The two hidden layer’s neuron sizes are set 70 and 35 in this case.

The “createTrainSet” method gives us the ability to choose only a range of batch set’s from the 60,000+ values. Setting this to a reasonable small number helps us reduce the time needed to load up the desired number of training values. In this example, we are using the values from 0 to 5000 from the 60,000.

The “trainData” method does all the training steps. We pass the created neural network to it, then the created trainSet. After that, we pass the number of “epoch” we want to loop through, then the number of “iterations” we would like to loop through. Finally, we pass the number of batch size we would like to work with. We preloaded 5000 images so might as well pass all those, but we can always chose a smaller or bigger number.

By “iteration” we mean the regular of times we loop through the training methods, as we did in the previous examples. On the other hand, “Epoch” in the neural network industry means the number of times rerun all the iteration loops with all the working datasets. You can think of the two terms as nested loops. “Iterations” being the inner loop, while “epoch” being the outer loop. The final results will be more accurate as more and more training iterations and epochs do we make, with more and more unique datasets. Naturally, the larger these numbers get, the slower our whole training process will get also.

The “testMyImage” methods will load our custom handwritten image, and tries to guess its digit value, based on the network’s trained knowledge. Let’s write any number with a mouse, with white brush over a completely black background. These pixels will represent input values from range from 0 (being completely black) to 1 (being completely white). In my case, I wrote number 3:

rwurl=https://imgur.com/00CJKGH

Let’s run the program. As you can see, this significantly takes longer than our previous small examples. We are working with larger amount of data over a larger network. This is a good time to mention, that the training normally only needs to occur once, even if it takes hours or days to setup. Once we properly trained our network, all the weight and bias values can be serialized and saved to a file for instance, so every time when we want to read a new handwritten image and ask the network for its guessed digit value, it can process it almost instantaneously. I will not go in detail of discussing the serialization process in this article, but will assume that the reader at this level does know what I’m talking about, or can figure it out very easily.

output: prepared: 0 prepared: 100 prepared: 200 prepared: 300 prepared: 400 prepared: 500 prepared: 600 prepared: 700 prepared: 800 prepared: 900 prepared: 1000 prepared: 1100 prepared: 1200 prepared: 1300 prepared: 1400 prepared: 1500 prepared: 1600 prepared: 1700 prepared: 1800 prepared: 1900 prepared: 2000 prepared: 2100 prepared: 2200 prepared: 2300 prepared: 2400 prepared: 2500 prepared: 2600 prepared: 2700 prepared: 2800 prepared: 2900 prepared: 3000 prepared: 3100 prepared: 3200 prepared: 3300 prepared: 3400 prepared: 3500 prepared: 3600 prepared: 3700 prepared: 3800 prepared: 3900 prepared: 4000 prepared: 4100 prepared: 4200 prepared: 4300 prepared: 4400 prepared: 4500 prepared: 4600 prepared: 4700 prepared: 4800 prepared: 4900 prepared: 5000 >>>>>>>>>>>>>>>>>>>>>>>>> epoch: 0 <<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>> epoch: 1 <<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>> epoch: 2 <<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>> epoch: 3 <<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>> epoch: 4 <<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>> epoch: 5 <<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>> epoch: 6 <<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>> epoch: 7 <<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>> epoch: 8 <<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>> epoch: 9 <<<<<<<<<<<<<<<<<<<<<<<<<< output neuron values: 0.030 0.001 0.000 0.256 0.000 0.002 0.000 0.001 0.000 0.000 corresponding number: 0 1 2 3 4 5 6 7 8 9 I think, that the handwritten number is: 3!

Did the network produce an accurate result? If yes, excellent! If not, don’t give up! Keep fiddling with the parameter values, give the network some more storage range in form of neurons, more testing data, or more iteration/epoch loops and retry the results until you manage to get it right.

Got any inspiration where else could you use this technology?

Neural Networks II: How do they work, where can I use them?

Fórum:

Címkék:

rwurl=https://imgur.com/FC1QvBY
In the second article in the series, I am attempting to:
  • Very briefly mention a few examples of all the Neural Network types and branches, since there are many. 
  • Focus on the most oldest and most simple one, the “Fully Connected, Feed Forward Neural Network”
  • Explain in great detail how it works using intuition and graphs, rather than math, to make it easy as possible to understand.
  • Explain the commonly used related terminology.
  • Show a real life example where and how you could use it.
 
The first steps to achieve artificial neural networks were made 75 years ago, and it became one of the hottest emerging technologies in recent years. The original idea was to produce a working mathematical abstraction, how would a biological brain function in theory, as I've mentioned in the previous article.
 
You don't have to be a neuroscientist to have at least a very basic understanding how would a biological brain work. Having a large number of brain-cells called "neurons", that can form connections called "synapses" between each other, based on the various signals that they receive from out body over our lifetime. If you receive a similar experience, the similar neurons will fire up with those connections, so you will remember the given situation easier and react to it faster, more accurately.
 
There are many-many types of Neural Networks branches and sub-branches nowadays, all of them trying to archive being closest to "perfect" solution for the given idea. The search is still ongoing, we still don't know how exactly the biological brain works, but we don't even know if that is the best way to achieve intelligence also. We may going to come up with even more efficient way than our current biological solution, like we did in many other areas in the modern industrial world.
 
Some of main aNN branch examples include the "Feed Forward Neural Networks", that are referred sometimes as "Conventional" neural networks. This is the earliest and oldest solution, based on the idea where neuron connection are "fed-forward" between neurons, so the information can travel through them in simple intuitive way, usually starting from leftmost and ending up in the rightmost positions.
 
The most well-known sub-branches here include the "Convolutional Neural Networks", where the connections are filtered and grouped between neurons, to simplify and scale down large amount of information to abstracted representations. This is generally used for image recognitions nowadays. Other well-known sub-branch is the "Fully Connected Neural Networks". Here, each neuron in a given layer is connected with every single neuron on the previous layer.
 
More modern main branch examples are the "Recurrent Neural Networks", where connections can form circles or reach similar non-conventional connections between each other. Some sub-branch examples can include "Bi-directional NN", or "Long Short-Term Memory NN". The latter example is generally used for speech recognition.
 
"Spiking Neural Networks" are sometimes referred as the third generation of NN, which can activate neuron connection in a seemingly randomly "spiking" way, and are probably the closest real representations of the biological brain solutions nowadays.
 
In this article we are going to deal with (you guessed it), the oldest and simplest one to tackle: the Fully Connected, Feed Forward Neural Networks.
 
Let’s understand step-by-step what do they consist of and how they work first, then later on we can talk about how we can use them.
 
What is a Fully Connected, Feed Forward Neural Network?
 
From the highest level, think of it as a calculating box where on one side you can feed in some information, and on the other side you can receive the calculated results:
 
rwurl=https://imgur.com/A0LWkLq
 
You can have more than one input and output values, technically any number of input or output values you would require, even very large ones:
 
rwurl=https://imgur.com/subBMJW
 
If you open the box, you will see all the neurons some layers separating them. The very first layer is the “input layer” and each neuron there will store an input value. Similarly the very last layer is the “output layer”, each neuron there will store the final output value:
 
rwurl=https://imgur.com/s5iHctX
 
Those in between layers are referred as “hidden layers”. They are called "hidden" because we never see (nor we really care) what happens in them, we just want them to help figure out the right results for out final “output layer”. The number of these hidden layers can be several, but usually a few is enough, as the larger this number gets, the slower all the calculations can take.
 
As I’ve said before, in FCNN each neurons in a given layer are connected to all the neurons in previous adjacent layers. One single connection has to be adjacent, we cannot skip over a layer, and so one connection between two neurons would be represented like this:
 
rwurl=https://imgur.com/x6Wk5VI
 
Connecting one neuron to all from the previous layer can be represented like this:
 
rwurl=https://imgur.com/qzTJiqO
 
After finishing populating all the rest of the connections, the network will look like this, hence the name “Fully connected”:
 
rwurl=https://imgur.com/0RfKlUy
 
Let’s break down this some more. Probably the most interesting component here is the “Neuron”. What would that be and how does it work?
 
This can get fairly “mathy”, but I will try to spare you by avoiding referring to math, and just giving the intuitive explanation whenever I can.
 
If we focus on one neuron, we can see that it can receive many values from one side, apply a summary function that adds these values up, and lastly it will apply a “Sigmoid” function to this sum, before releasing the neuron’s calculated output.
 
rwurl=https://imgur.com/IKKutPg
 
The sigmoid is an “S” shaped function as you can see in this graph, and the purpose of it to transform the sum value between 0 and 1. Even if the sum turns out to be a crazily large or crazily small number for instance, it will always be “corrected” back somewhere between 0 and 1 with this function. We are doing this to simplify working with the data in the network. It’s much simpler to understand the numbers beings closer to 1 as “perhaps yes”, and the numbers being close to 0 as “perhaps no”.
 
rwurl=https://imgur.com/Lz82eVY
 
What do I mean by “perhaps”? As I’ve said in the first article, neural networks by design are not meant for super precise calculations like we would expect from normal computers, but to do good approximations, and they will do better and better approximations as they train more.
 
Going back to our example, let’s assume we have 3 neurons with output values between 0 and 1 somewhere: 0.8, 0.3, 0.5:
 
rwurl=https://imgur.com/HpiYEUE
 
The sum function will add all the received values up.
 
sum(0.8, 0.3, 1.6)  = 0.8 + 0.3 + 0.5 = 1.6
 
After that, the neuron will apply the Sigmoid function to this value so we will squeeze any result back between 0 and 1 somewhere, resulting 0.832 as the output value from this neuron:
 
sigmoid(1.6) = 0.832
 
This would be the formula for the Sigmoid function, for those who would like to see the math as well:
 
rwurl=https://imgur.com/p3Su53a
 
 
If we continue doing this over every neuron, until we get the final output layer, we will get our final calculated values, but you perhaps realized: we would have the same output results every time for the same given input values. In many practical cases we cannot modify the input value since we are receiving them from some other sources, also the summary and the sigmoid function’s behavior is fixed as well, but we would still like to influence and shape the outputted values somehow. Because of this need, we invented the idea of “Weights”, that are basically custom numbers, stored at the connections between the neurons. People usually refer to connections between neurons simply as “Weights”.
 
So how do “Weights” come in play?
Weights are getting multiplied with the neuron output, before that value gets summarized with the rest in the summary function, so for example if all the weights will be 1, nothing would change:
 
rwurl=https://imgur.com/yPchuhO
 
sum (0.8, 0.3, 0.5) = 0.8*1 + 0.3*1 + 0.5*1 = 1.6
 
But if we turn these weight values up or down somewhere, the outputted value can be very different:
 
rwurl=https://imgur.com/idwKgeJ
 
sum (0.8, 0.3, 0.5) = 0.8*-0.5 + 0.3*2.2 + 0.5*0.4 = -0.4 + 0.66 + 0.2 = 0.46
 
Now this solutions would be almost perfect, but people found out over time, that there may still be cases when even applying heavy Weight modifications all around the network, the final outputted values would still not be close to desired numbers, because of the Sigmoid function design. Here was the concept of “Bias” born.
 
“Bias” is very similar to Weights as being a single modifiable arbitrary number, but the difference is that it only applies to every neuron once, in the Sigmoid function, to translate it left or right.
 
Imagine a situation where your final values after applying the Summary function with Weights are converging to near 0. But after applying the Sigmoid function as well, it will bump back the output value to somewhere around 0.5, while you would rather keep that value indication 0.  This is where a Bias can be applied and will basically translate the whole sigmoid function to a direction, modifying the output greatly. Let’s see the difference with a bias of -5 or +5:
 
rwurl=https://imgur.com/Lj0Rk3N
 
As we can see, if we would add a Bias of -5 (red graph) to the summary before applying the Sigmoid function would result the neuron output very close to 1, while with the bias of 5 (blue graph), the output would be very close to 0.
 
So we’re happy now, with all these flexibility we really could archive any desired final output values!
 
The basic concept of “Fully Connected, Feed Forward Neural Network” is established. How or where could we use it?
 
Let’s have a nice practical example: We want it to read written number from 0 to 9. How can we approach this problem with our newly setup Neural Network?
 
First of all, let’s clear our objective: to turn any of these written “three” number images, or any similar ones, to result “3”:
 
rwurl=https://imgur.com/lUsf7X9
 
 
That includes all these written “four” number images, to “4”:
 
rwurl=https://imgur.com/iecL0HO
 
… and so on, so on.
 
We would need to turn all these images to input values first.
Let’s take a closer look at one of them. We can see that it’s been made of individual pixels. 28 rows * 28 columns of them:
 
rwurl=https://imgur.com/zAKEqpT
 
Each of these pixels have a brightness value, some of them are very bright, and some of them are darker. If we represent the brightest “unused” pixels with 0.0 and as they got darker, with a number closer and closer to 1.0, indicating that they have some sort of “activated” values there:
 
rwurl=https://imgur.com/CeYu7a6
 
If we convert all the remaining pixels to number representations as well, and write these values down in one long row, we halve all the input neuron values ready to be processed with our NN, all 784 (28*28) of them!
 
As for the output neurons, the most straightforward is to have one for each desired number (0-9). So 10 neurons in total.
 
rwurl=https://imgur.com/KkJUhGQ
 
If we plug in the digitized values to the input layer, from the image that represents written number three, we would like to receive 0.0 on all of the output neurons, except on the fourth one, that would need to be 1.0 ideally, to clearly represent number “3” ideally. (Implying the first neuron represents “0”, the second “1”, and so on until the 10th neuron, representing “9”.)
 
rwurl=https://imgur.com/CyWDBrz
 
But if we do that, we will find out that the output layer’s neuron values are nowhere near this but show some utter garbage:
 
rwurl=https://imgur.com/30oMUWC
 
That’s because the network haven’t been “Trained” yet.
 
“Training” the network means (re)adjusting all the Weights and Biases over the network to certain positions, so if we plug in the said input values, the network should produce the calculated output closest to possible to the desired ideal output.
 
We could try to manually adjust any the Weight or Bias number to some positive or negative number values, but will quickly realize that with even a fair number of neurons, there are just so many combinations it’s humanly not comprehendible to do so.
 
This is where the concept of “Backpropagation” comes extremely handy.
 
Backpropagation is one of the key features at the neural networks and is a form of a learning algorithms. It’s probably one of the most confusing concepts of it however. Simplifying it as much as possible, the basic idea is to take that utter garbage output from the neural network output, try to compare it our initially desired output, and see how far each of those outputted values are from the desired ones.
 
This number is called the “Error Bias” and if we have this number, the algorithm will try to adjust the weights and biases accordingly, starting from the rightmost layers, and work themselves back until they reach the input layer. We start from the back because the final output is at the back, and the connected Weights and Biases that are affecting that layer directly are in the previous layer, and we apply this idea over each layer.
 
After the Backpropagation is finished, we re-do the Feed-Forward step again and see if we got closer to the given value, by comparing the actual and the desired number again. A proper training can take hundreds, or millions of Feed-Forward and Backpropagation steps, until the network is conditioned to give us the closest numbers to the desired ones. Now we need to do this training process for every new input values and still make sure that the network remains giving valid results for the previous input values as well. You can begin to understand, that properly training a network over large amount of input values, to always outputs accurately close to the desired outputs is extremely hard to archive and usually takes very large number of training steps. But this is where the whole industry is working hard by discovering creative and different ways to approach this difficult issue.
 

 

Neural Networks: Why do we care and what are they?

Fórum:

Címkék:

rwurl=https://imgur.com/FC1QvBY
Neural Networks, among similarly high tech sci-fi sounding terms are used more and more commonly in the articles around the Internet.

In this article I am attempting to:
  • Give a few examples why would we care about this technology at all.
  • Demystify the terminologies like Neural Networks, Artificial Intelligence, Machine Learning and Deep Learning.
  • Classify them with simple terms, where they belong and how do they relate to each other.


Let's have a quick overview about the current state of the technology:

Amazon Go
rwurl=https://www.youtube.com/watch?v=vorkmWa7He8
Last Monday, Amazon opened Amazon Go, a convenience store at Seattle. Their selling point focuses on cashier-less and cashier line-less experience, to greatly speed up the whole shopping process. You enter the store by launching their app and scanning the displayed QR code at the gate. When you walk out from the store, all the bought items will be charged to your Amazon account after a few moments.

The magic of this technology is in the store. They've installed hundreds of cameras at the celling, so they can track and process every item's position, whenever you pick them up or put them back. Behind this technology is a heavy processing power and a machine learning algorithm that can track and understand what happens at the store at any moment.

Amazon used similar machine learning technologies to suggest relevant product for potential customers, based on their previous buying or browsing behaviors. This approach made Amazon the number 1 e-commerce retailer in the world.

Twitter
rwurl=https://www.youtube.com/watch?v=64gTjdUrDFQ
Project Veritas, an undercover journalist activist group presented to the public that Twitter is perhaps using machine learning algorithms that can suppress articles, stories, tweets with certain political views and promote ones that are different kind of political views. On similar idea, Facebook announced that it will battle the so called "fake news" stories and will suppress them from our feed, preventing them from spreading around.

YouTube
rwurl=https://www.youtube.com/watch?v=9g2U12SsRns
YouTube is using its own machine learning technology implementation, called Content ID, to scan the content of every user's uploaded videos and find the ones that are breaking their Terms of Services and Copyright laws. By the way, Google is using machine learning for almost all of their services. For search results, speech recognition, translation, maps, etc., with great success.

Self-Driving Cars
rwurl=https://www.youtube.com/watch?v=aaOB-ErYq6Y
Self driving cars is another emerging market for Artificial Intelligence, large number of companies are pushing out their own version of self-driving algorithms, so they can save time and money for many people and companies around the world. Tesla, BMW, Volvo, GM, Ford, Nissan, Toyota, Google, even Apple is working on their solutions and most of them aims to be street ready around 2020-2021.

Targeting ads using ultrasound + microphone
Targeting ads generally is a huge field nowadays and every ad company is trying to introduce more and more creative approaches to get ahead of the competition. One less known idea lays around the fact that the installed application can access most of the mobile phone hardware, so theoretically they can easily listen to microphone input signals. Retail stores can emit ultrasound signals from certain products and if that signal gets picked up by the app (for instance the person spend more than a few seconds in from of a certain item), it can automatically report to ad companies that the user was interested about the product, so a little extra push, in a form of carefully targeted ad may cause the person decide to buy it.

Blizzard
Blizzard announced that they may ban Overwatch players for "toxic comments" on social media, like YouTube, Facebook and similar places. Gathering and processing data this size, also making the required connections between them certainly needs their own machine learning strategies and processing power.

Facebook
Facebook patented a technology that allows to track dust or fingerprints smudges on camera lenses, this way the image recognition algorithms can recognize if any of the presented pictures are made with the same camera or not. They claimed that they never put this patented technology in use, but nevertheless it’s a great idea, with many different application possibilities from development perspective.

Boston Dynamic
rwurl=https://www.youtube.com/watch?v=rVlhMGQgDkY
Boston Dynamic is one of the leaders in robotics by building one of the most advanced ones on earth. They are using efficient machine learning technologies to teach their robots for doing certain tasks and overcoming certain problems.

Ok… Artificial Intelligence, Machine Learning, and Neural Networks ... what do they exactly mean and how do these terms relate to each other?

We learned that these technologies are popping out almost everywhere and becoming more and more relevant to our normal days in every aspects that we do, aiding or controlling our lives in one way or another. Reading all these “buzzwords” in technical articles around the Internet, you probably noticed that many of these terms are used interchangeably, or without any explanatory context. So let’s demystify their meaning and let’s properly categorize them for future references.

First of all, let’s clear their meaning:

Artificial Intelligence, or AI has the broadest meaning of all the three mentioned.

It usually attempts to mimic "cognitive" functions in humans and other beings, for example learning, adapting, judgment, evaluation, logic, problem solving.

Generally speaking, an AI usually does:
  • Learn - by observing, sensing, or any ways that it can gather data.
  • Process - by logic, adapting, evaluating, or judging the data.
  • Apply - by solving the given problem.
     
AI can be as simple for instance, as the ghosts in Pacman. Each have their own objectives and each tries to accomplish them by observing the player's behavior and position, so they can process that data and react upon it.

AI can be a chess player that tries to outsmart a human player.

AI can also be a search engine that gives you more relevant results to any of your search terms than any human could ever do, given the amount of constantly changing data and human behavior around the whole Internet.

Machine Learning or ML, has again many implementation and a fairly broad meaning.

Usually we can generalize the ideas behind it by stating: Machine Learning is subset of Computer Science, and its objective is to create systems that are programmed and controlled by the processed data, rather than specifically instructed by human programmers. In other words, Machine Learning algorithms are attempting to program themselves, rather than relying on human programmers to do so.

Neural Networks, or more accurately referred as Artificial Neural Networks, are a subset of Computer Science, and their objective is to create systems that resembles natural neural networks, like our human brains for instance, so they can produce similar cognitive capabilities. Again, there are many implementation of this idea, but generally it’s based on the model of artificial neurons spread across at least three, or more layers.

We will get into the details of the "how exactly" in the next article.

Neural network are great approach to identify non-linear patterns (for linear patterns, classical computing is better). Patterns where there is no clear one-to-one relation between the output and the input values. Neural networks are also excellent for approximations.

We also hear a lot about Deep Learning and that is just one, more complex implementation on the idea of Neural Networks involving much more layers. All that can create much greater level of abstraction than we normally would use for simpler tasks. Think of the complexity required for image recognition, search engines, translations.

We learned now the general meanings behind these few terms, but how do they relate to each other then?

Artificial Intelligence has been around quite some time now, and some implementations of Machine Learning is used to create much more efficient Artificial Intelligences that just wasn't a possibility before. Following this combining idea, Machine Learning is using the technologies of Neural Networks to implement its learning algorithms.

So as we can see, all of these technologies can function and work by themselves, but also they can be combined with each other to create more efficient solutions to certain problems. Most of the times however the latter is the case nowadays. All of the mentioned three technologies are combined and used together, as the currently most efficient and effective solution to the given problems: our currently most advanced versions of Artificial Intelligences are created with a Machine Learning algorithms that are using Neural Networks as their learning and data processing mechanism.
 
rwurl=https://imgur.com/oIVNOqB
 
In summary:
  • We were given a few examples why would we care about this technology at all?
  • We demystified the terminologies like Neural Networks, Artificial Intelligence, Machine Learning and Deep Learning.
  • We classified them with simple terms, explained where they belong, and how do they relate to each other.

In next articles I will explain in simplified steps how Neural Networks work, and will provide a programming example that any of the readers could implement and try out themselves as well. Furthermore, I will talk about the relations and differences between Artificial Neural Networks and Natural Neural Networks (our human brain, for example). I will talk about the concept of consciousness, as a natural question that tipically follows these ideas.
 

REWiRED - Kutyus felfedő szétszéledés - 2014-2057 © Minden Jog Fenntartva!
Virtuális valóság és Kecskeklónozó központ - Oculus MegaRift - PS21 - Mozi - 4D - Bajuszpödrés
Médiaajánlat/Borsós Brassói Árak
Rohadt Impresszum!