A Conceptual Look at GANs
How Neural Networks Creatively Generate Data
GANs are an incredibly interesting evolution of neural networks. Before getting into what a GAN is let’s start with the basics and get an understanding of what a neural network is.
A Conceptual Look at Neural Networks
Neural networks can be very intimidating. Below is a representation of a neural network and it demonstrates only a fraction of the complexity they encapsulate.
Don’t worry about trying to understand that!
For the purposes of this post there isn’t a need to drill down to that level of detail. Instead, let’s keep things simple. Let us consider a neural network as a black box.
This black box takes something as input and it returns some output. That’s it. Ignore the calculus, the algorithmic lingo, and the intimidating diagrams. A neural network takes one or more inputs, it does some math behind the scenes, and it returns one or more outputs.
You are likely more familiar with a related type of neural network, one that is used to perform facial recognition. Facial recognition neural networks look something like this.
For a given photo this neural network will return three outputs indicating who it thinks the photo is. If we peel back one layer of detail we can reveal additional information about the outputs.
Under the hood the neural network is performing a bit of math to produce those outputs. Because of this, the outputs are numeric rather than text. The ordered outputs correlate to a specific label and they represent a confidence of assigning those respective labels. Above we see an untrained neural network that is 57% confident that it was given a photo of Albert Einstein, 12% confident that it is Stephen Hawking, and 31% confident that it is Isaac Newton.
Training
Training a neural network consists of giving the neural network a lot of sample data, allowing it to see how far off its guesses were, and adjust accordingly.
In the case of facial recognition, a neural network is given tens of thousands of photos to train on. The above image represents a single training run where the neural network was given a photo of Albert Einstein. Each output is 57%
, 12%
, and 31%
respectively, but more importantly the amount of error is the difference from each output’s expected values. 43%
, 12%
, and 31%
respectively.
The neural network takes the error value for each output and behind the scenes it does some math and adjusts its internals such that if it were given the same photo again it should have slightly more accurate outputs. This slight adjustment is one of the reasons for needing to repeat this process many, many times for a lot of different photos.
After enough training runs the neural network should have fine tuned its internal dials such that it can generally make correct guesses for photos it has never seen before!
A conceptual look at the GAN architecture
Now that we have a general understanding of neural networks, let’s take a look at a GAN. A GAN is a Generative Adversarial Network. “Generative” meaning that once it’s trained it will be able to generate data similar to the data it was trained with. If it was trained with photos with human faces it will eventually be capable of producing fake photos with faces. The “adversarial” part reveals how this is accomplished. A GAN is two neural networks that are competing against each other.
A GAN consists of a discriminator neural network and a generator neural network. Each neural network has a single input and a single output.
Similar to the facial recognition example above, the discriminator takes a single photo as input. However, this time it only returns a single numeric output between 0 and 1. When the output is near 1 (or 100%) then the discriminator thinks the photo is from the real training data. Conversely, when the output is near 0 then it thinks the photo is fake, meaning it was created by the generator.
The generator is something we haven’t seen before. It takes what is effectively a random number as input and it also has a single output, except this time the output is an image. These are the fake images that the discriminator is attempting to detect as indicated by a discriminator output near 0.
Training
It may not be immediately apparent how these two adversarial neural networks compete against one another. Let’s take a look at a training run and understand the implications of this architecture.
The first step is for the discriminator to be given a real photo and as output it will try and guess whether it was actually given a real photo.
This untrained discriminator guessed with 65% confidence that it was given a real photo. It then takes the amount of error, 35%, and back propagates that error inside the black box and slightly turns its dials so that the next time it sees photos generally like this one it’ll be more a little more accurate.
The next step is to bring the generator into the mix.
As input the generator is given a randomly generated number and as output the generator produces a fake image that is meant to trick the discriminator. Because this generator is untrained at the moment it is essentially white noise in / white noise out and does not produce a particularly convincing fake photo to begin with, but it has to start somewhere.
The generator training run is not over yet. This fake photo is now provided to the discriminator for it to make yet another guess at whether it received real or fake input.
This time the discriminator leans toward it being a fake image, but it hasn’t gone through many training cycles and it has room to improve. The 39% error is back propagated into the neural network and it adjusts its internals to make better predictions in the future.
However, the generator is also observing the discriminator’s output. The generator will take the discriminator’s prediction and also back propagate what it considers to be the error which is 61%. The generator adjusts its internal dials based on the 61% error such that the next fake image it generates is slightly more likely to trick the discriminator. The goal of the generator is to trick the discriminator by producing images similar to the training photos.
This is the adversarial part of a Generative Adversarial Network. Over tens of thousands of training runs the discriminator is being trained to get better at discerning which input photos are real and which are fake. At the same time, the generator is cheating by looking at the discriminator’s output and turning its internal dials to produce ever more convincing fake photos.
This leads to an arm’s race where after a sufficient number of training runs we end up with a single artifact, the generator, that can be given any random number and produce and convincingly fake photo of a human face.
Let’s take a moment and reflect on what is happening in the above animation. The generator’s earliest attempts at producing fake images look nothing like real faces. However, when looking at how the discriminator is making predictions we see that the generator improves by evolving different strategies.
With more convincing fakes, the discriminator now has to improve and it trains to get better at discerning real photos from fakes. But now the generator has to improve in order to keep up. This leads to an arm’s race and ultimately culminates in a generator that can be given any random number and as output it produces a convincing fake image.
Above is the final output. The coolest part about this is none of these are of real people. Each and every one of them is entirely made up by providing a random number to a trained neural network!
Additional Resources
I hope you found this as intriguing as I did when I first learned about GANs. Here are a few resources to help you learn more.
- A Pytorch tutorial that produced the above GAN example
- The original white paper on GANs by Ian Goodfellow
- A lightning talk I gave on this topic