I trained a doc2vec model using python gensim
on a corpus of 40,000,000 documents. This model is used for infering docvec on millions of documents everyday. To ensure stability, I set alpha
to a small value and a large steps
instead of setting a constant random seed:
from gensim.models.doc2vec import Doc2Vec
model = Doc2Vec.load('doc2vec_dm.model')
doc_demo = ['a','b']
# model.random.seed(0)
model.infer_vector(doc_demo, alpha=0.1, min_alpha=0.0001, steps=100)
doc2vec.infer_vector()
accepts only one documents each time and it takes almost 0.1 second to infer each docvec. Is there any API
that can handle a series of documents in each infering step?