I am training a doc2vec gensim model
with txt file 'full_texts.txt' that contains ~1600 documents. Once I have trained the model, I wish to use similarity methods over words and sentences.
However, since this is my first time using gensim , I am unable to get a solution. If I want to look for similarity by words I try as mentioned below but I get an error that the word doesnt exist in the vocabulary
and on the other question is how do I check similarity for entire documents? I have read a lot of questions around it, like this one and looked up documentation but still not sure what I am doing wrong.
from gensim.models import Doc2Vec
from gensim.models.doc2vec import TaggedLineDocument
from gensim.models.doc2vec import TaggedDocument
tagdocs = TaggedLineDocument('full_texts.txt')
d2v_mod = Doc2Vec(min_count=3,vector_size = 200, workers = 2, window = 5, epochs = 30,dm=0,dbow_words=1,seed=42)
d2v_mod.build_vocab(tagdocs)
d2v_mod.train(tagdocs,total_examples=d2v_mod.corpus_count,epochs=20)
d2v_mod.wv.similar_by_word('overdraft',topn=10)
KeyError: "word 'overdraft' not in vocabulary"