0

I am working to understand Erik Linder-Norén's implementation of the Categorical GAN model, and am confused by the generator in that model:

def build_generator(self):
    model = Sequential()
    # ...some lines removed...    
    model.add(Dense(np.prod(self.img_shape), activation='tanh'))
    model.add(Reshape(self.img_shape))
    model.summary()

    noise = Input(shape=(self.latent_dim,))
    label = Input(shape=(1,), dtype='int32')
    label_embedding = Flatten()(Embedding(self.num_classes, self.latent_dim)(label))
    model_input = multiply([noise, label_embedding])
    img = model(model_input)

    return Model([noise, label], img)

My question is: How does the Embedding() layer work here?

I know that noise is a vector that has length 100, and label is an integer, but I don't understand what the label_embedding object contains or how it functions here.

I tried printing the shape of label_embedding to try and figure out what's going on in that Embedding() line but that returns (?,?).

If anyone could help me understand how the Embedding() lines here work, I'd be very grateful for their assistance!

duhaime
  • 25,611
  • 17
  • 169
  • 224

3 Answers3

3

To keep in mind why use embedding here at all: the alternative is to concatenate the noise with the conditioned class, which may cause the generator to completely ignore the noise values, generating data with high similarity in each class (or even just 1 per class).

ian
  • 399
  • 2
  • 15
2

From the documentation, https://keras.io/layers/embeddings/#embedding,

Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]

In the GAN model, the input integer(0-9) is converted to a vector of shape 100. With this short code snippet, we can feed some test input to check the output shape of the Embedding layer.

from keras.layers import Input, Embedding
from keras.models import Model
import numpy as np
latent_dim = 100
num_classes = 10
label = Input(shape=(1,), dtype='int32')
label_embedding = Embedding(num_classes, latent_dim)(label)
mod = Model(label, label_embedding)
test_input = np.zeros((1))
print(f'output shape is {mod.predict(test_input).shape}')
mod.summary()

output shape is (1, 1, 100)

From model summary, output shape for embedding layer is (1,100) which is the same as output of predict.

embedding_1 (Embedding) (None, 1, 100) 1000

One additional point, in the output shape (1,1,100), the leftmost 1 is the batch size, the middle 1 is the input length. In this case, we provided an input of length 1.

Manoj Mohan
  • 5,654
  • 1
  • 17
  • 21
  • Thanks for this simple model @MajorMohan, I hadn't thought to do that. Do you understand what `[[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]` is meant to signify? What is the mapping described by that function? Also how should one understand the value returned by `mod.predict(test_input)` here? The inputs are zeros and the output is a dense, seemingly random vector. If you could help me understand why that vector has the values it does, I'd be grateful. – duhaime Mar 08 '19 at 13:51
  • In the conditional GAN, two inputs are fed to the network: noise and y(conditioning variable). Noise is represented by a vector of length 100. For y, as well we use an Embedding layer to convert the input to a vector of length 100. We multiply noise and the output of Embedding layer and feed it to the network. The Embedding layer has weights as well which are learnt as part of the training process. The code snippet I created was just to show how the dimensions of the output of the Embedding layer can be checked. – Manoj Mohan Mar 08 '19 at 19:47
  • https://stackoverflow.com/questions/47868265/what-is-the-difference-between-an-embedding-layer-and-a-dense-layer – Manoj Mohan Mar 08 '19 at 19:48
  • https://github.com/malzantot/Pytorch-conditional-GANs/blob/master/conditional_dcgan.py In this implementation, instead of using an Embedding layer, a one-hot encoding of possible y values followed by a Dense layer is used to represent y. This is then concatenated with the noise (lines 63-65). – Manoj Mohan Mar 08 '19 at 19:52
1

The embedding stores the per label state. If I read the code correctly, each label corresponds to a digit; i.e. there is an embedding that captures how to generate a 0, 1, ... 9.

This code takes some random noise and multiplies it to this per label state. The result should be a vector that leads the generator to display the digit corresponding to the label (i.e. 0..9).

Pedro Marques
  • 2,642
  • 1
  • 10
  • 10
  • thanks for your response. If you have a moment could I please ask you to describe how that multiplication works? One vector has shape 10, and the other has shape 100, -- how are they multiplied? What is the shape of the output from that operation, and what does each cell in the resulting vector represent? Any pointers you can offer on these questions would be hugely helpful! – duhaime Mar 07 '19 at 22:34
  • 1
    The Embedding layer returns 1 vector that is self.latent_dim wide. It performs a lookup operation. You can think of embedding as a matrix of [num_classes, embedding_dims] and the lookup as a slicing operation where [label] is the index. It outputs a shape that is [1, latent_dim]. And the Flatten() op converts that to a vector. Thus noise and the embedding lookup have the same dimension. multiply just does element-wise multiplication of the vectors. – Pedro Marques Mar 08 '19 at 07:27