2

I need to remove an invalid word from the vocab of a "gensim.models.keyedvectors.Word2VecKeyedVectors".

I tried to remove it using del model.vocab[word], if I print the model.vocab the word disappeared, but when I run model.most_similar using other words the word that I deleted is still appearing as similar. So how can I delete a word from model.vocab in a way that affect the model.most_similar to not bring it?

  • 1
    Possible duplicate of [How to remove a word completely from a Word2Vec model in gensim?](https://stackoverflow.com/questions/48941648/how-to-remove-a-word-completely-from-a-word2vec-model-in-gensim) – Shern Apr 17 '19 at 04:37

2 Answers2

2

There's no existing method supporting the removal of individual words.

A quick-and-dirty workaround might be to, at the same time as removing the vocab entry, noting the index of the existing vector (in the underlying large vector array), and also changing the string in the kv_model.index2entity list at that index to some plug value (like say, '***DELETED***').

Then, after performing any most_similar(), discard any entries matching '***DELETED***'.

gojomo
  • 52,260
  • 14
  • 86
  • 115
0

Refer to:

How to remove a word completely from a Word2Vec model in gensim?

  1. Possible method 1: I solve it by editing the text model file itself.
  2. Possible method 2: Refer to @zsozso's answer. (Though I didn't get it to

work).

Shern
  • 712
  • 1
  • 8
  • 20