Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions

votes

6 answers

Remove empty documents from DocumentTermMatrix in R topicmodels?

I am doing topic modelling using the topicmodels package in R. I am creating a Corpus object, doing some basic preprocessing, and then creating a DocumentTermMatrix: corpus <- Corpus(VectorSource(vec), readerControl=list(language="en")) corpus <-…

r lda topic-modeling topicmodels

asked Dec 19 '12 at 01:25

Bill M

votes

2 answers

LDA topic modeling - Training and testing

I have read LDA and I understand the mathematics of how the topics are generated when one inputs a collection of documents. References say that LDA is an algorithm which, given a collection of documents and nothing more (no supervision needed), can…

lda topic-modeling

asked Jun 22 '12 at 18:52

tan

1,569
5
14
30

votes

3 answers

Python Gensim: how to calculate document similarity using the LDA model?

I've got a trained LDA model and I want to calculate the similarity score between two documents from the corpus I trained my model on. After studying all the Gensim tutorials and functions, I still can't get my head around it. Can somebody give me a…

python nlp lda gensim

asked Mar 16 '14 at 06:51

still_st

votes

2 answers

Simple Python implementation of collaborative topic modeling?

I came across these 2 papers which combined collaborative filtering (Matrix factorization) and Topic modelling (LDA) to recommend users similar articles/posts based on topic terms of post/articles that users are interested in. The papers (in PDF)…

python machine-learning lda topic-modeling collaborative-filtering

asked Aug 25 '15 at 23:40

jxn

7,685
28
90
172

votes

3 answers

Topic distribution: How do we see which document belong to which topic after doing LDA in python

I am able to run the LDA code from gensim and got the top 10 topics with their respective keywords. Now I would like to go a step further to see how accurate the LDA algo is by seeing which document they cluster into each topic. Is this possible in…

python nltk lda gensim

asked Jan 08 '14 at 00:30

jxn

7,685
28
90
172

votes

5 answers

Understanding LDA implementation using gensim

I am trying to understand how gensim package in Python implements Latent Dirichlet Allocation. I am doing the following: Define the dataset documents = ["Apple is releasing a new product", "Amazon sells many things", …

python gensim lda topic-modeling dirichlet

asked Dec 03 '13 at 11:31

visakh

2,503
8
29
55

votes

5 answers

how to determine the number of topics for LDA?

I am a freshman in LDA and I want to use it in my work. However, some problems appear. In order to get the best performance, I want to estimate the best topic number. After reading "Finding Scientific topics", I know that I can calculate logP(w|z)…

nlp data-mining lda

asked Jul 02 '13 at 09:22

Chelsea Wang

votes

2 answers

What's the disadvantage of LDA for short texts?

I am trying to understand why Latent Dirichlet Allocation(LDA) performs poorly in short text environments like Twitter. I've read the paper 'A biterm topic model for short text', however, I still do not understand "the sparsity of word…

nlp lda topic-modeling

asked Apr 22 '15 at 03:05

Shuguang Zhu

votes

10 answers

How to print the LDA topics models from gensim? Python

Using gensim I was able to extract topics from a set of documents in LSA but how do I access the topics generated from the LDA models? When printing the lda.print_topics(10) the code gave the following error because print_topics() return a…

python nlp lda topic-modeling gensim

asked Feb 22 '13 at 02:47

alvas

115,346
109
446
738

votes

1 answer

Export pyLDAvis graphs as standalone webpage

i am analysing text with topic modelling and using Gensim and pyLDAvis for that. Would like to share the results with distant colleagues, without a need for them to install python and all required libraries. Is there a way to export interactive…

python gensim lda topic-modeling

asked Jan 30 '17 at 13:10

Darius

votes

1 answer

Predicting LDA topics for new data

It looks like this question has may have been asked a few times before (here and here), but it has yet to be answered. I'm hoping this is due to the previous ambiguity of the question(s) asked, as indicated by comments. I apologize if I am breaking…

r lda topic-modeling

asked Apr 20 '13 at 00:01

David

9,284
3
41
40

votes

3 answers

How does the removeSparseTerms in R work?

I am using the removeSparseTerms method in R and it required a threshold value to be input. I also read that the higher the value, the more will be the number of terms retained in the returned matrix. How does this method work and what is the logic…

r tm lda

asked Feb 27 '15 at 10:55

London guy

27,522
44
121
179

votes

4 answers

LDA model generates different topics everytime i train on the same corpus

I am using python gensim to train an Latent Dirichlet Allocation (LDA) model from a small corpus of 231 sentences. However, each time i repeat the process, it generates different topics. Why does the same LDA parameters and corpus generate…

python nlp lda topic-modeling gensim

asked Feb 25 '13 at 13:08

alvas

115,346
109
446
738

votes

3 answers

LDA with topicmodels, how can I see which topics different documents belong to?

I am using LDA from the topicmodels package, and I have run it on about 30.000 documents, acquired 30 topics, and got the top 10 words for the topics, they look very good. But I would like to see which documents belong to which topic with the…

r lda topic-modeling tm

asked Feb 14 '13 at 12:22

d12n

votes

2 answers

Document topical distribution in Gensim LDA

I've derived a LDA topic model using a toy corpus as follows: documents = ['Human machine interface for lab abc computer applications', 'A survey of user opinion of computer system response time', 'The EPS user interface…

python lda gensim

asked Jun 26 '13 at 03:13

Moses Xu

2,140
4
24
35

2 3

…

78 79 Next