I have trained LDA model on 2000 URL's(containing articles) on a particular topic in Python3. Can we predict new corpus based on the trained model?
Asked
Active
Viewed 2,118 times
1 Answers
2
Assuming your dictionary is named as dic_1
and new_corpus
is collections of documents.
We first create a gensim corpus by following:
corpus_1= [dic_1.doc2bow(tokenize(doc)) for doc in new_corpus ]
Now we can make prediction using trained model by following:
new_predictions= LDA[corpus_1]

IliassA
- 95
- 2
- 8

Atendra Gautam
- 465
- 3
- 11
-
1I printed "print(model[corpus3])", gives me output
How to read the output to make sense out of it? – Deepti Mar 16 '18 at 10:19 -
https://stackoverflow.com/questions/19504898/use-s%D1%81ikit-learn-tfidf-with-gensim-lda?answertab=active#tab-top With reference tothe above link: I am able to convert matrix in scipy.sparse format into a streaming gensim corpus. But while training into lda = models.ldamodel.LdaModel(corpus=new_corpus, id2word=dictionary, num_topics=100), What will be the dictionary used here? As dictionary for tfidf would be different - how is that generated? – Deepti Mar 21 '18 at 04:38
-
Do you want to train another model using "new_corpus" ? Please elaborate more about what you want to achieve as right now its confusing. – Atendra Gautam Mar 21 '18 at 11:52
-
I tried to use Tfidf on my training set & want to feed into my LDA model. While doing that I am able to convert the tfidf matrix into gensim corpus by using **gensim.matutils.Sparse2Corpus(tfidf_matrix,documents_columns=False)** But while passing it into the LDA model, should I use vectorize.vocabulary_ as my dictionary or what else can be used as dictionary? How can I improve my model because the accuracy is not good at all? – Deepti Mar 22 '18 at 05:30
-
Atendra : the code mentioned above corpus_1= [dic_1.doc2bow(tokenize(doc)) for docin new_corpus ] here corpus_1- is the new corpus formed by New article. dic_1 is the dictionary created for new corpus? new_corpus : is old corpus of our trained model? what is doc here ? Is it our training dataset? – Deepti Mar 23 '18 at 09:24