I am trying to find best hyperparameters for my trained doc2vec gensim model which takes a document as an input and create its document embeddings. My train data consists of text documents but it doesn't have any labels. i.e. I just have 'X' but not 'y'.
I found some questions here related to what I am trying to do but all of the solutions are proposed for supervised models but none for unsupervised like mine.
Here is the code where I am training my doc2vec model:
def train_doc2vec(
self,
X: List[List[str]],
epochs: int=10,
learning_rate: float=0.0002) -> gensim.models.doc2vec:
tagged_documents = list()
for idx, w in enumerate(X):
td = TaggedDocument(to_unicode(str.encode(' '.join(w))).split(), [str(idx)])
tagged_documents.append(td)
model = Doc2Vec(**self.params_doc2vec)
model.build_vocab(tagged_documents)
for epoch in range(epochs):
model.train(tagged_documents,
total_examples=model.corpus_count,
epochs=model.epochs)
# decrease the learning rate
model.alpha -= learning_rate
# fix the learning rate, no decay
model.min_alpha = model.alpha
return model
I need suggestions on how to proceed and find best hyperparameters for my trained model using GridSearch or any suggestions about some other technique. Help is much appreciated.