Handling unknown word in word embedding in Tensorflow

Asked Oct 30 '16 at 13:29

Active Oct 30 '16 at 13:29

Viewed 1,691 times

I'm currently working on relation classification. I train a word embedding matrix and I would like to use it into a tensorflow model. However, in my dataset, some words are unknown. I use the same way as proposed in Using a pre-trained word embedding (word2vec or Glove) in TensorFlow.

I would like to know if there is a way in tensorflow to use automatically a nul vector to represent unknown words. Currently, I add an extra column to the word embedding for such words (nul vector) but I would like to update the word embedding matrix during the training without modifying the column for unknown words.

Moreover, I also use this column to pad my sentences.

Is there a way to do it automatically in Tensorflow ?

edited May 23 '17 at 12:08

Community

asked Oct 30 '16 at 13:29

XogoX

1

I guess the best thing is to create one nul vector the padding tokens, one fo the unknown words, where for padding tokens trainable=False and unknown words trainable=True and finally, use tf.concat(0, [word_embedding, unknown vector, padding vector]) and adjust the indices – XogoX Oct 30 '16 at 15:46

Handling unknown word in word embedding in Tensorflow

0 Answers0