I am following this tutorial in order to understand CNNs in NLP. There are a few things which I don't understand despite having the code in front of me. I hope somebody can clear a few things up here.
The first rather minor thing is the sequence_length
parameter of the TextCNN
object. In the example on github this is just 56
which I think is the max-length of all sentences in the training data. This means that self.input_x
is a 56-dimensional vector which will contain just the indices from the dictionary of a sentence for each word.
This list goes into tf.nn.embedding_lookup(W, self.intput_x)
which will return a matrix consisting of the word embeddings of those words given by self.input_x
. According to this answer this operation is similar to using indexing with numpy:
matrix = np.random.random([1024, 64])
ids = np.array([0, 5, 17, 33])
print matrix[ids]
But the problem here is that self.input_x
most of the time looks like [1 3 44 25 64 0 0 0 0 0 0 0 .. 0 0]
. So am I correct if I assume that tf.nn.embedding_lookup
ignores the value 0?
Another thing I don't get is how tf.nn.embedding_lookup
is working here:
# Embedding layer
with tf.device('/cpu:0'), tf.name_scope("embedding"):
W = tf.Variable(
tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
name="W")
self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x)
self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)
I assume, taht self.embedded_chars
is the matrix which is the actual input to the CNN where each row represents the word embedding of one word. But how can tf.nn.embedding_lookup
know about those indices given by self.input_x
?
The last thing which I don't understand here is
W
is our embedding matrix that we learn during training. We initialize it using a random uniform distribution.tf.nn.embedding_lookup
creates the actual embedding operation. The result of the embedding operation is a 3-dimensional tensor of shape[None, sequence_length, embedding_size]
.
Does this mean that we are actually learning the word embeddings here? The tutorial states at the beginning:
We will not used pre-trained word2vec vectors for our word embeddings. Instead, we learn embeddings from scratch.
But I don't see a line of code where this is actually happening. The code of the embedding layer does not look like as if there is anything being trained or learned - so where is it happening?