1

I am following this tutorial in order to understand CNNs in NLP. There are a few things which I don't understand despite having the code in front of me. I hope somebody can clear a few things up here.


The first rather minor thing is the sequence_lengthparameter of the TextCNN object. In the example on github this is just 56 which I think is the max-length of all sentences in the training data. This means that self.input_x is a 56-dimensional vector which will contain just the indices from the dictionary of a sentence for each word.

This list goes into tf.nn.embedding_lookup(W, self.intput_x) which will return a matrix consisting of the word embeddings of those words given by self.input_x. According to this answer this operation is similar to using indexing with numpy:

matrix = np.random.random([1024, 64]) 
ids = np.array([0, 5, 17, 33])
print matrix[ids]

But the problem here is that self.input_x most of the time looks like [1 3 44 25 64 0 0 0 0 0 0 0 .. 0 0]. So am I correct if I assume that tf.nn.embedding_lookup ignores the value 0?


Another thing I don't get is how tf.nn.embedding_lookup is working here:

# Embedding layer
with tf.device('/cpu:0'), tf.name_scope("embedding"):
    W = tf.Variable(
        tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
            name="W")
    self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x)
    self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)

I assume, taht self.embedded_chars is the matrix which is the actual input to the CNN where each row represents the word embedding of one word. But how can tf.nn.embedding_lookup know about those indices given by self.input_x?


The last thing which I don't understand here is

W is our embedding matrix that we learn during training. We initialize it using a random uniform distribution. tf.nn.embedding_lookup creates the actual embedding operation. The result of the embedding operation is a 3-dimensional tensor of shape [None, sequence_length, embedding_size].

Does this mean that we are actually learning the word embeddings here? The tutorial states at the beginning:

We will not used pre-trained word2vec vectors for our word embeddings. Instead, we learn embeddings from scratch.

But I don't see a line of code where this is actually happening. The code of the embedding layer does not look like as if there is anything being trained or learned - so where is it happening?

Community
  • 1
  • 1
Stefan Falk
  • 23,898
  • 50
  • 191
  • 378

1 Answers1

3

Answer to ques 1 (So am I correct if I assume that tf.nn.embedding_lookup ignores the value 0?) :

The 0's in the input vector is the index to 0th symbol in the vocabulary, which is the PAD symbol. I don't think it gets ignored when the lookup is performed. 0th row of the embedding matrix will be returned.

Answer to ques 2 (But how can tf.nn.embedding_lookup know about those indices given by self.input_x?) :

Size of the embedding matrix is [V * E] where is the size of vocabulary and E is dimension of embedding vector. 0th row of matrix is embedding vector for 0th element of vocabulary, 1st row of matrix is embedding vector for 1st element of vocabulary. From the input vector x, we get the indices of words in vocabulary, which are used for indexing the embedding matrix.

Answer to ques 3 (Does this mean that we are actually learning the word embeddings here?).

Yes, we are actually learning the embedding matrix. In the embedding layer, in line W = tf.Variable( tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),name="W") W is the embedding matrix and by default, in tensorflow trainable=TRUE for variable. So, W will also be a learned parameter. To use pre- trained model, set trainable = False.

For detailed explanation of the code you can follow blog: https://agarnitin86.github.io/blog/2016/12/23/text-classification-cnn

Nitin
  • 2,572
  • 5
  • 21
  • 28