45

I have trained a word2vec model using a corpus of documents with Gensim. Once the model is training, I am writing the following piece of code to get the raw feature vector of a word say "view".

myModel["view"]

However, I get a KeyError for the word which is probably because this doesn't exist as a key in the list of keys indexed by word2vec. How can I check if a key exits in the index before trying to get the raw feature vector?

London guy
  • 27,522
  • 44
  • 121
  • 179

8 Answers8

46

Word2Vec also provides a 'vocab' member, which you can access directly.

Using a pythonistic approach:

if word in w2v_model.vocab:
    # Do something

EDIT Since gensim release 2.0, the API for Word2Vec changed. To access the vocabulary you should now use this:

if word in w2v_model.wv.vocab:
    # Do something

EDIT 2 The attribute 'wv' is being deprecated and will be completed removed in gensim 4.0.0. Now it's back to the original answer by OP:

if word in w2v_model.vocab:
    # Do something
tomar__
  • 105
  • 5
Matt Fortier
  • 1,213
  • 1
  • 10
  • 18
35

convert the model into vectors with

word_vectors = model.wv

then we can use

if 'word' in word_vectors.vocab
pacholik
  • 8,607
  • 9
  • 43
  • 55
rakaT
  • 474
  • 4
  • 2
14

The vocab attribute was removed from KeyedVector in Gensim 4.0.0. Try using this:

if 'word' in model.wv.key_to_index:
    # do something

https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4#4-vocab-dict-became-key_to_index-for-looking-up-a-keys-integer-index-or-get_vecattr-and-set_vecattr-for-other-per-key-attributes

quemeful
  • 9,542
  • 4
  • 60
  • 69
1

Answering my own question here.

Word2Vec provides a method named contains('view') which returns True or False based on whether the corresponding word has been indexed or not.

London guy
  • 27,522
  • 44
  • 121
  • 179
  • 8
    For future reference, this doesn't work anomore. `'Word2Vec' object has no attribute 'contains'` – CentAu Dec 13 '15 at 23:32
1

I generally use a filter:

for doc in labeled_corpus:
    words = filter(lambda x: x in model.vocab, doc.words)

This is one simple method for getting past the KeyError on unseen words.

Prakhar Agarwal
  • 2,724
  • 28
  • 31
1

as @quemeful has mentioned, you could do something like,

if "view" in model.wv.key_to_index.keys():
    # do something
0

Hey i know am getting late this post, but here is a piece of code that can handle this issue well. I myself using it in my code and it works like a charm :)

   size = 300 #word vector size
   word = 'food' #word token

   try:
        wordVector = model[word].reshape((1, size))
   except KeyError:
        print "not found! ",  word

NOTE: I am using python Gensim Library for word2vec models

Nomiluks
  • 2,052
  • 5
  • 31
  • 53
0

to check if the word is exist in your model you can use

word2vec_pretrained_dict = dict(zip(w2v_model.key_to_index.keys(), w2v_model.vectors))

where w2v_model.key_to_index give you dictionary of each word and sequance number

and w2v_model.vectors return the vectorized for of each word