1

I have trained LDA model on 2000 URL's(containing articles) on a particular topic in Python3. Can we predict new corpus based on the trained model?

Deepti
  • 21
  • 8

1 Answers1

2

Assuming your dictionary is named as dic_1 and new_corpus is collections of documents.

We first create a gensim corpus by following:

corpus_1= [dic_1.doc2bow(tokenize(doc)) for doc in new_corpus ]

Now we can make prediction using trained model by following:

new_predictions= LDA[corpus_1]
IliassA
  • 95
  • 2
  • 8
Atendra Gautam
  • 465
  • 3
  • 11
  • 1
    I printed "print(model[corpus3])", gives me output How to read the output to make sense out of it? – Deepti Mar 16 '18 at 10:19
  • https://stackoverflow.com/questions/19504898/use-s%D1%81ikit-learn-tfidf-with-gensim-lda?answertab=active#tab-top With reference tothe above link: I am able to convert matrix in scipy.sparse format into a streaming gensim corpus. But while training into lda = models.ldamodel.LdaModel(corpus=new_corpus, id2word=dictionary, num_topics=100), What will be the dictionary used here? As dictionary for tfidf would be different - how is that generated? – Deepti Mar 21 '18 at 04:38
  • Do you want to train another model using "new_corpus" ? Please elaborate more about what you want to achieve as right now its confusing. – Atendra Gautam Mar 21 '18 at 11:52
  • I tried to use Tfidf on my training set & want to feed into my LDA model. While doing that I am able to convert the tfidf matrix into gensim corpus by using **gensim.matutils.Sparse2Corpus(tfidf_matrix,documents_columns=False)** But while passing it into the LDA model, should I use vectorize.vocabulary_ as my dictionary or what else can be used as dictionary? How can I improve my model because the accuracy is not good at all? – Deepti Mar 22 '18 at 05:30
  • Atendra : the code mentioned above corpus_1= [dic_1.doc2bow(tokenize(doc)) for docin new_corpus ] here corpus_1- is the new corpus formed by New article. dic_1 is the dictionary created for new corpus? new_corpus : is old corpus of our trained model? what is doc here ? Is it our training dataset? – Deepti Mar 23 '18 at 09:24