Generative Adversarial Networks (GANs) from basics to advanced

In this Generative Adversarial Networks (GANs) series, we will understand from the basic vanilla GANs to the advanced GANs such as CycleGAN, StackGAN, and more. Before diving in and understanding about Generative Adversarial Networks (GANs), first, let us understand the difference between the discriminative and generative models.


Differences between discriminative and generative models

Given some data points, the discriminative model learns to classify the data points into their respective classes by learning the decision boundary that separates the classes in an optimal way. The generative models can also classify given data points, but instead of learning the decision boundary, they learn the characteristics of each of the classes. For instance, let’s consider the image classification task for predicting whether a given image is an apple or an orange. As shown in the following figure, to classify between apple and orange, the discriminative model learns the optimal decision boundary that separates the apples and oranges classes, while generative models learn their distribution by learning the characteristics of the apple and orange classes:

Generative adversarial networks - from basics to advanced
Generative adversarial networks – from basics to advanced

To put it simply, discriminative models learn to find the decision boundary that separates the classes in an optimal way, while the generative models learn about the characteristics of each class.

Discriminative models predict the labels conditioned on the input p(y | x) , whereas generative models learn the joint probability distribution p(x,y) . Examples of discriminative models include logistic regression, Support Vector Machine (SVM), and so on, where we can directly estimate p(y|x) from the training set. Examples of generative models include Markov random fields and naive Bayes, where first we estimate p(x,y) to determine p(y | x) :

Generative adversarial networks - from basics to advanced
Generative adversarial networks – from basics to advanced

Say hello to GANs!

GAN was first introduced by Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio in their paper, Generative Adversarial Networks, in 2014. GANs are used extensively for generating new data points. They can be applied to any type of dataset, but they are popularly used for generating images. Some of the applications of GANs include generating realistic human faces, converting grayscale images to colored images, translating text descriptions into realistic images, and many more.

Yann LeCun said the following about GANs: “The coolest idea in deep learning in the last 20 years.”

GANs have evolved so much in recent years that they can generate a very realistic image. The following figure shows the evolution of GANs in generating images over the course of five years:

Generative adversarial networks - from basics to advanced
Generative adversarial networks – from basics to advanced

Excited about GANs already? Now, we will see how exactly they work. Before going ahead, let’s consider a simple analogy. Let’s say you are the police and your task is to find counterfeit money, and the role of the counterfeiter is to create fake money and cheat the police. The counterfeiter constantly tries to create fake money in a way that is so realistic that it cannot be differentiated from the real money. But the police have to identify whether the money is real or fake. So, the counterfeiter and the police essentially play a two-player game where one tries to defeat the other. GANs work something like this. They consist of two important components:

  • Generator
  • Discriminator

You can perceive the generator as analogous to the counterfeiter, while the discriminator is analogous to the police. That is, the role of the generator is to create fake money, and the role of the discriminator is to identify whether the money is fake or real. Without going into detail, first, we will get a basic understanding of GANs. Let’s say we want our GAN to generate handwritten digits. How can we do that? First, we will take a dataset containing a collection of handwritten digits; say, the MNIST dataset. The generator learns the distribution of images in our dataset. Thus, it learns the distribution of handwritten digits in our training set. Once, it learns the distribution of the images in our dataset, and we feed a random noise to the generator, it will convert the random noise into a new handwritten digit similar to the one in our training set based on the learned distribution:

Generative adversarial networks - from basics to advanced
Generative adversarial networks – from basics to advanced

The goal of the discriminator is to perform a classification task. Given an image, it classifies it as real or fake; that is, whether the image is from the training set or the image is generated by the generator:

Generative adversarial networks - from basics to advanced
Generative adversarial networks – from basics to advanced

The generator component of GAN is basically a generative model, and the discriminator component is basically a discriminative model. Thus, the generator learns the distribution of the class and the discriminator learns the decision boundary of a class. As shown in the following figure, we feed a random noise to the generator, and it then converts this random noise into a new image similar to the one we have in our training set, but not exactly the same as the images in the training set. The image generated by the generator is called a fake image, and the images in our training set are called real images. We feed both the real and fake images to the discriminator, which tells us the probability of them being real. It returns 0 if the image is fake and 1 if the image is real:

Generative adversarial networks - from basics to advanced
Generative adversarial networks – from basics to advanced

Now that we have a basic understanding of generators and discriminators, we will study each of the components in detail.


Breaking down the generator

The generator component of a GAN is a generative model. When we say the generative model, there are two types of generative models— an implicit and an explicit density model. The implicit density model does not use any explicit density function to learn the probability distribution, whereas the explicit density model, as the name suggests, uses an explicit density function. GANs falls into the first category. That is, they are an implicit density model. Let’s study in detail and understand how GANs are an implicit density model

Let’s say we have a generator, G . It is basically a neural network parametrized by \theta_g . The role of the generator network is to generate new images. How do they do that? What should be the input to the generator?

We sample a random noise, z , from a normal or uniform distribution, p_z . We feed this random noise z , as an input to the generator and then it converts this noise to an image:

G\left(z ; \theta_{g}\right)

Surprising, isn’t it? How does the generator converts a random noise to a realistic image?

Let’s say we have a dataset containing a collection of human faces and we want our generator to generate a new human face. First, the generator learns all the features of the face by learning the probability distribution of the images in our training set. Once the generator learns the correct probability distribution, it can generate totally new human faces.

But how does the generator learn the distribution of the training set? That is, how does the generator learn the distribution of images of human faces in the training set?

A generator is nothing but a neural network. So, what happens is that the neural network learns the distribution of the images in our training set implicitly; let’s call this distribution a generator distribution, p_g . At the first iteration, the generator generates a really noisy image. But over a series of iterations, it learns the exact probability distribution of our training set and learns to generate a correct image by tuning its \theta_g parameter.

It is important to note that we are not using the uniform distribution p_z for learning the distribution of our training set. It is only used for sampling random noise, and we feed this random noise as an input to the generator. The generator network implicitly learns the distribution of our training set and we call this distribution a generator distribution, p_g and that is why we call our generator network an implicit density model.

Breaking down the discriminator

As the name suggests, the discriminator is a discriminative model. Let’s say we have a discriminator, D . It is also a neural network and it is parametrized by \theta_d .

The goal of the discriminator to discriminate between two classes. That is, given an image x , it has to identify whether the image is from a real distribution or a fake distribution (generator distribution). That is, discriminator has to identify whether the given input image is from the training set or the fake image generated by the generator:

D\left(x ; \theta_{d}\right)

Let’s call the distribution of our training set the real data distribution, which is represented by p_r . We know that the generator distribution is represented by p_g .

So, the discriminator D essentially tries to discriminate whether the image x is from p_r or p_g

How do they learn though?

So far, we just studied the role of the generator and discriminator, but how do they learn exactly? How does the generator learn to generate new realistic images and how does the discriminator learn to discriminate between images correctly? We know that the goal of the generator is to generate an image in such a way as to fool the discriminator into believing that the generated image is from a real distribution. In the first iteration, the generator generates a noisy image. When we feed this image to the discriminator, discriminator can easily detect that the image is from a generator distribution. The generator takes this as a loss and tries to improve itself, as its goal is to fool the discriminator. That is, if the generator knows that the discriminator is easily detecting the generated image as a fake image, then it means that it is not generating an image similar to those in the training set. This implies that it has not learned the probability distribution of the training set yet. So, the generator tunes its parameters in such a way as to learn the correct probability distribution of the training set. As we know that the generator is a neural network, we simply update the parameters of the network through backpropagation. Once it has learned the probability distribution of the real images, then it can generate images similar to the ones in the training set. Okay, what about the discriminator? How does it learn? As we know, the role of the discriminator is to discriminate between real and fake images.

If the discriminator incorrectly classifies the generated image; that is, if the discriminator classifies the fake image as a real image, then it implies that the discriminator has not learned to differentiate between the real and fake image. So, we update the parameter of the discriminator network through backpropagation to make the discriminator learn to classify between real and fake images.

So, basically, the generator is trying to fool the discriminator by learning the real data distribution, , and the discriminator is trying to find out whether the image is from a real or fake distribution. Now the question is, when do we stop training the network in light of the fact that both generator and discriminator are competing against each other?

Basically, the goal of the GAN is to generate images similar to the one in the training set. Say we want to generate a human face—we learn the distribution of images in the training set and generate new faces. So, for a generator, we need to find the optimal discriminator. What do we mean by that?

We know that a generator distribution is represented by p_g and the real data distribution is represented by p_r . If the generator learns the real data distribution perfectly, then p_g equals p_r , as shown in the following plot:

Generative adversarial networks - from basics to advanced
Generative adversarial networks – from basics to advanced

When p_r = p_r , then the discriminator cannot differentiate between whether the input image is from a real or a fake distribution, so it will just return 0.5 as a probability, as the discriminator will become confused between the two distributions when they are same.

So, for a generator, the optimal discriminator can be given as follows:

D(x)=\frac{p_{r}(x)}{p_{r}(x)+p_{g}(x)}=\frac{1}{2}

So, when the discriminator just returns the probability of 0.5 for any image, then we can say that the generator has learned the distribution of images in our training set and fooled the discriminator successfully.

Architecture of a GAN


The architecture of a GAN is shown in the following diagram:

Generative adversarial networks - from basics to advanced
Generative adversarial networks – from basics to advanced

As shown in the preceding diagram, Generator G takes the random noise, z , as input by sampling from a uniform or normal distribution and generates a fake image by implicitly learning the distribution of the training set.

We sample an image, x , from the real data distrubtion, x \sim p_{r}(x) , and fake data distribution, x \sim p_{g}(x) , and feed it to the discriminator, D . We feed real and fake images to the discriminator and the discriminator performs a binary classification task. That is, it returns 0 when the image is fake and 1 when the image is real.

In the next post, we will learn about the loss function of GAN.

Leave a Reply