I am using TensorFlow to do a customized embedding training similar to continuous bag of words (CBOW) model. However, unlike 'CBOW', which has a fixed length sliding window, my sliding window can be considered as flexible. Here is the problem:
Let's say, the embedding is word embedding. For word t, I have a tensor showing indexes of its context words: [-1, 1, 2, -1]. The maximum window size is 4, so the length of the vector is 4. But sometimes I do not have 4 context words for a word, so I use '-1' to mean 'no word in this position', and other integers are the index of a word. I also have an 'embedding' tensor, which is the embeddings for all the words.
What I am trying to do is to get the average embedding for the context words in order to represent the context. For example, if the context words are [-1, 1, 2, -1], I would get (1 * (embedding for word 1) + 2 * (embedding for word 2) ) / 2. I just need to neglect all the -1.
So in my code, I try to loop through the context word tensor to compare each value with -1 and use an if condition to control if I would add the embedding of this context word. I tried different ways for this, but always get 'TypeError: Using a tf.Tensor
as a Python bool
is not allowed.'
Is there a way to solve this problem? Or even better, is there a better representation of positions with no words so I can compute more efficiently (Tried to use NaN but also get a lot of troubles...)?
Thanks a lot for the help and hopefully I have a clear description of the problem.