Hi, In the supervised learning episode, we taughtJohn Green-bot to learn using a perceptron, a program that imitates one neuron. But our brains make decisions with 100 billionneurons, which have trillions of connections between them! We can actually do a lot more with AI if weconnect a bunch of perceptrons together, to create what’s called an artificial neuralnetwork.
Neural networks are better than other methodsfor certain tasks like, image recognition. The secret to their success is their hiddenlayers, and they’re mathematically very elegant. Both of these reasons are why neural networksare one of the most dominant machine learning technologies used today. [INTRO] Not that long ago, a big challenge in AI wasreal-world image recognition, like recognizing a dog from a cat, and a car from a plane froma boat.
Even though we do it every day, it’s reallyhard for computers. That’s because computers are good at literalcomparisons, like matching 0s and 1s, one at a time. It’s easy for a computer to tell that theseimages are the same by matching the pixels. But before AI, a computer couldn’t tellthat these images are of the same dog, and had no hope of telling that all of these differentimages are dogs. So, a professor named Fei-Fei Li and a groupof other machine learning and computer vision researchers wanted to help the research communitydevelop AI that could recognize images.
The first step was to create a huge publicdataset of labeled real-world photos. That way, computer scientists around the worldcould come up with and test different algorithms. They called this dataset ImageNet. It has 3.2 million labeled images, sortedinto 5,247 nested categories of nouns. Like for example, the “dog” label is nestedunder “domestic animal,” which is nested under “animal.” Humans are the best at reliably labeling data. But if one person did all this labeling, taking10 seconds per label, without any sleep or snack breaks, it would take them over a year! So ImageNet used crowd-sourcing and leveragedthe power of the Internet to cheaply spread the work between thousands of people.
Once the data was in place, the researchersstarted an annual competition in 2010 to get people to contribute their best solutionsto image recognition. Enter Alex Krizhevsky, who was a graduatestudent at the University of Toronto. In 2012, he decided to apply a neural networkto ImageNet, even though similar solutions hadn’t been successful in the past. His neural network, called AlexNet, had acouple of innovations that set it apart. He used a lot of hidden layers, which we’llget to in a minute. He also used faster computation hardware tohandle all the math that neural networks do. AlexNet outperformed the next best approachesby over 10%. It only got 3 out of every 20 images wrong. In grade terms, it was getting a solid B whileother techniques were scraping by with a low C.
Since 2012, neural network solutions havetaken over the annual competition, and the results keep getting better and better. Plus, AlexNet sparked an explosion of researchinto neural networks, which we started to apply to lots of things beyond image recognition. To understand how neural networks can be usedfor these classification problems, we have to understand their architecture first. All neural networks are made up of an inputlayer, an output layer, and any number of hidden layers in between.
There are many different arrangements butwe’ll use the classic multi-layer perceptron as an example. The input layer is where the neural networkreceives data represented as numbers. Each input neuron represents a single feature,which is some characteristic of the data. Features are straightforward if you’re talkingabout something that’s already a number, like grams of sugar in a donut. But, really, just about anything can be convertedto a number. Sounds can be represented as the amplitudesof the sound wave.
So each feature would have a number that representsthe amplitude at a moment in time. Words in a paragraph can be represented byhow many times each word appears. So each feature would have the frequency ofone word. Or, if we’re trying to label an image ofa dog, each feature would represent information about a pixel. So for a grayscale image, each feature wouldhave a number representing how bright a pixel is. But for a color image, we can represent eachpixel with three numbers: the amount of red, green, and blue, which can be combined tomake any color on your computer screen. Once the features have data, each one sendsits number to every neuron in the next layer, called the hidden layer. Then, each hidden layer neuron mathematicallycombines all the numbers it gets.
The goal is to measure whether the input datahas certain components. For an image recognition problem, these componentsmay be a certain color in the center, a curve near the top, or even whether the image containseyes, ears, or fur. Instead of answering yes or no, like the simplePerceptron from the previous episode, each neuron in the hidden layer does some slightlymore complicated math and outputs a number.
And then, each neuron sends its number toevery neuron in the next layer, which could be another hidden layer or the output layer. The output layer is where the final hiddenlayer outputs are mathematically combined to answer the problem. So, let’s say we’re just trying to labelan image as a dog. We might have a single output neuron representinga single answer – that the image is of a dog or not. But if there are many answers, like for exampleif we’re labeling a bunch of images, we’ll need a lot of output neurons.
Each output neuron will correspond to theprobability for each label — like for example, dog, car, spaghetti, and more. And then we can pick the answer with the highestprobability. The key to neural networks — and really allof AI — is math. And I get it. A neural network kind of seems like a blackbox that does math and spits out an answer. I mean, those middle layers are even calledhidden layers! But we can understand the gist of what’shappening by working through an example.
Oh John Green Bot? Let’s give John Green-bot a program witha neural network that’s been trained to recognize a dog in a grayscale photo. When we show him this photo first, every featurewill contain a number between 0 and 1 corresponding to the brightness of one pixel. And it’ll pass this information to the hiddenlayer. Now, let’s focus on one hidden layer neuron.
Since the neural network is already trained,this neuron has a mathematical formula to look for a particular component in the image,like a specific curve in the center. The curve at the top of the nose. If this neuron is focused on this specificshape and spot, it may not really care what’s happening everywhere else. So it would multiply or weigh the pixel valuesfrom most of those features by 0 or close to 0. Because it’s looking for bright pixels here,it would multiply these pixel values by a positive weight.
But this curve is also defined by a darkerpart below. So the neuron would multiply these pixel valuesby a negative weight. This hidden neuron will add all the weightedpixel values from the input neurons and squish the result so that it’s between 0 and 1. The final number basically represents theguess of this neuron thinking that a specific curve, aka a dog nose, appeared in the image.
Other hidden neurons are looking for othercomponents, like for example, a different curve in another part of the image , or afuzzy texture. When all of these neurons pass their estimatesonto the next hidden layer, those neurons may be trained to look for more complex components. Like, one hidden neuron may check whetherthere’s a shape that might be a dog nose.
It probably doesn’t care about data fromprevious layers that looked for furry textures, so it weights those by 0 or close to 0. But it may really care about neurons thatlooked for the “top of the nose” and “bottom of the nose” and “nostrils”. It weights those by large positive numbers. Again, it would add up all the weighted valuesfrom the previous layer neurons, squish the value to be between 0 and 1, and pass thisto the next layer. That’s the gist of the math, but we’resimplifying a bit.
It’s important to know that neural networksdon’t actually understand ideas like “nose” or “eyelid.” Each neuron is doing a calculation on thedata it’s given and just flagging specific patterns of light and dark. After a few more hidden layers, we reach theoutput layer with one neuron! So after one more weighted addition of theprevious layer’s data, which happens in the output neuron, the network should havea good estimate if this image is a dog. Which means, John Green-bot should have adecision.
John Green-bot: Output neuron value: 0.93. Probability that this is a dog: 93%! Hey John Green Bot nice job! Thinking about how a neural network wouldprocess just one image makes it clearer why AI needs fast computers. Like I mentioned before, each pixel in a colorimage will be represented by 3 numbers — how much red, green, and blue it has. So to process a 1000 by 1000 pixel image,which in comparison is a small 3 by 3 inch photo, a neural network needs to look at 3million features! AlexNet needed more than 60 million neuronsto achieve this, which is a ton of math and could take a lot of time to compute.
Which is something we should keep in mindwhen designing neural networks to solve problems. People are really excited about using deeperneural networks, which are networks with more hidden layers, to do deep learning. Deep networks can combine input data in morecomplex ways to look for more complex components, and solve trickier problems.
But we can’t make all networks like a billionlayers deep, because more hidden layers means more math which again would mean that we needfaster computers. Plus, as a network get deeper, it gets harderfor us to make sense of why it’s giving the answers it does. Each neuron in the first hidden layer is lookingfor some specific component of the input data.
But in deeper layers, those components getmore abstract from how humans would describe the same data. Now, this may not seem like a big deal, butif a neural network was used to deny our loan request for example, we’d want to know why. Which features made the difference? How were they weighed towards the final answer? In many countries, we have the legal rightto understand why these kinds of decisions were made.
And neural networks are being used to makemore and more decisions about our lives. Most banks for example use neural networksto detect and prevent fraud. Many cancer tests, like the Pap test for cervicalcancer, use a neural network to look at an image of cells under a microscope, and decidewhether there’s a risk of cancer. And neural networks are how Alexa understandswhat song you’re asking her to play and how Facebook suggests tags for our photos.
Understanding how all this happens is reallyimportant to being a human in the world right now, whether or not you want to build yourown neural network. So this was a lot of big-picture stuff, butthe program we gave John Green-bot had already been trained to recognize dogs. The neurons already had algorithms that weightedinputs.