The advent of Generative Adversarial Networks (GANs) is an interesting and powerful one, as Yann LeCun, Facebook’s AI research director describes adversarial training as ‘the most interesting idea in the last 10 years in ML.’
GANs use two neural networks and pit one against the other, (hence the term ‘adversarial’) to generate synthetic and new instances of data that you could pass for real data. They are widely used in image generation, voice generation, and video generation.
It has indeed shown a lot of promise in image generation, and a remarkable example is when an artist sold a portrait for $432,000 that was generated by a GAN. It’s great that GANs can do all of these amazing things, but how do they work? What’s the first step to take if you are interested in using GAN to generate images for your project?
In this article, we would discuss what the GAN architecture is by explaining the terms ‘generator,’ and ‘discriminator.’ We would also explain how you can use it to generate images by illustrating an example.
WHAT IS GAN?
As stated earlier, GAN is an algorithm that uses two neural networks (Generator and Discriminator networks) which compete, resulting in instances of data that are fake but look like the original data. To have a full grasp of GAN, understanding how the generator and discriminator work is vital.
GENERATOR
The goal of the generator is to generate or produce new and synthetic samples of a certain input, which could be a random set of values or noise. For instance, if you trained it with the class of a cat, the generator would perform a series of computations, and then produce an image of a cat, which isn’t real but looks real.
Ideally, the output won’t be the same cat for every run, and to make sure the generator produces different samples each time, the input would be a random set of values, known as the noise vector.
Considering the fact that it is ‘competing’ with the discriminator, the generator does its best to produce a new fake image with the hope that the discriminator would consider the image to be authentic.
DISCRIMINATOR
The goal of the discriminator, therefore, is to process the images from the generator and classify them as either real or fake. It works as a binary classifier by taking two inputs: the first being a real image (from training data), and the other being the image the generator produced.
GENERATING IMAGES WITH GAN
Now that we understand how GANs work, let’s move into the details of how to implement GANs to generate images.
We’d be implementing code on Google collab and we used the TensorFlow library. You can access the full code here.
STEP 1: Import the necessary libraries
The relevant libraries must first be loaded.
STEP 2: Load the data and conduct data preprocessing
In this example, we load the Fashion MNIST dataset using the ‘tf_keras’ datasets module. We don’t need labels to solve this problem, hence we only make use of the training images, x_train. Next, we reshape the images, and because the data is in unit8 format by default, we cast them to float32.
Our preprocessing also involves normalizing the data from [0, 255] to [-1, 1]. Then we build the TensorFlow input pipeline. Summarily, we feed the ‘tf.data.Dataset.from_tensor_slices’ with the training data, shuffle, and slice it into tensors which allows us to access tensors of defined batch size during training.
STEP 3: Create the Generator Network
Here, we have fed the generator with a 100-D noise vector which was sampled from a normal distribution. The next thing we do is to define the input layer with the shape (100,). ‘he_uniform’ is the default weight initializer for the linear layers in Tensor Flow.
Then, we use ‘tf.reshape’ to reshape the 784-D tensor to batch sizes 28, 28, and 1, the first parameter being the input tensor and the second being the new shape of the tensor.
We finally pass the generator function’s input and output layer to create the model.
STEP 4: Create the discriminator network
The discriminator is a binary classifier and only has fully connected layers. Because of this, it’s only expecting a tensor of shape (Batch Size, 28, 28, 1). However, the discriminator function only has dense layers, which means we have to reshape the tensor to a vector of shape which is Batch Size, 784. You’d see the sigmoid activation function on the last layer, and what this does is produce the output value between 0 (fake) and 1 (real).
STEP 5: Define the loss function
This is the generator’s individual loss
And this is the discriminator’s loss.
STEP 6: Optimize both the generator and discriminator
To optimize both the generator and discriminator, we use ‘Adam Optimizer,’ and this takes two arguments which are the learning rate and beta coefficients.
During backpropagation, these arguments compute the running averages of gradients.
The core of the whole GAN training is the ‘train_step’ function. This is because we can combine all the training functions as defined above.
What the ‘@tf.function’ does is compile the train_step function into a TensorFlow graph that we can call. Additionally, it reduces training time.
STEP 7: FINAL TRAINING
Finally, the time has arrived for us to sit and view the magic, but let’s pause for a minute. The function above requires two parameters (training data and number of epochs). When you give it those parameters, you can then proceed to run the program and watch as GAN does its magic.
Algoscale – helping you leverage cutting-edge technology for what you may!
In this article, we have seen what GANs are, how they work, and how you can use them to generate images. We hope you found it useful and we’d like to inform you that if you stay connected with us, we have more in store for you. We also want you to know that whatever stage you are at, we are dedicated to helping you thrive in this big data age, and leverage technology like GAN for your business, project, or even personal interest.