Questions tagged [topicmodels]

topicmodels is an R package implementing Latent Dirichlet Allocation topic modeling.

Excerpt from topicmodels page on CRAN:

Provides an interface to the C code for Latent Dirichlet Allocation (LDA) models and Correlated Topics Models (CTM) by David M. Blei and co-authors and the C++ code for fitting LDA models using Gibbs sampling by Xuan-Hieu Phan and co-authors.

101 questions
44
votes
6 answers

Remove empty documents from DocumentTermMatrix in R topicmodels?

I am doing topic modelling using the topicmodels package in R. I am creating a Corpus object, doing some basic preprocessing, and then creating a DocumentTermMatrix: corpus <- Corpus(VectorSource(vec), readerControl=list(language="en")) corpus <-…
Bill M
  • 711
  • 1
  • 6
  • 8
6
votes
1 answer

LDA TopicModels producing list of numbers rather than terms

Bear with me as I am extremely new to this and working on a project for a course in a certificate program. I have .csv dataset that I obtained by retrieving bibliometric records from Pubmed and Embase databases. There are 1034 rows. There are…
SciLibby
  • 63
  • 2
4
votes
4 answers

Coherence score (u_mass) -18 is good or bad?

I read this question (Coherence score 0.4 is good or bad?) and found that the coherence score (u_mass) is from -14 to 14. But when I did my experiments, I got a score of -18 for u_mass and 0.67 for c_v. I wonder how is my u_mass score out of range…
Dammio
  • 911
  • 1
  • 7
  • 15
4
votes
3 answers

how to add tokens to gensim dictionary

I use gensim to build dictionary from a collection of documents. Each document is a list of tokens. this my code def constructModel(self, docTokens): """ Given document tokens, constructs the tf-idf and similarity models""" #construct…
Athari
  • 171
  • 3
  • 11
3
votes
0 answers

Diagnostics (perplexity, LogLik, etc) for LDA topic model with textmodel_seededLDA package in R

I'm using the seededLDA package to do an LDA topic model. However, all of the packages and functions I've found to compute perplexity, log likelihood, exclusivity, etc (and other diagnostic tools) don't work on these models (they only work on…
3
votes
1 answer

Structural Topic Modeling in R: Plot statistical significance for Topic Content

my question relates to structural topic modeling in R, specifically to the stm package developed by Roberts et al. (https://cran.r-project.org/web/packages/stm/vignettes/stmVignette.pdf). I implemented a structural topic model in order to…
RAnnR
  • 31
  • 1
3
votes
1 answer

Can text2vec and topicmodels generate similar topics with suitable parameter settings for LDA?

I was wondering how results of different packages, hence, algorithms, differ and if parameters could be set in a way to produce similar topics. I had a look at the packages text2vec and topicmodels in particular. I used below code to compare 10…
Manuel Bickel
  • 2,156
  • 2
  • 11
  • 22
3
votes
1 answer

What is the probability of a TERM for a specific TOPIC in Latent Dirichlet Allocation (LDA) in R

I'm working in R, package "topicmodels". I'm trying to work out and better understand the code/package. In most of the tutorials, documentation I'm reading I'm seeing people define topics by the 5 or 10 most probable terms. Here is an example: …
2
votes
1 answer

Structural Topic Model (STM) Error: UNRELIABLE VALUE: Future (‘’) unexpectedly generated random numbers without specifying argument 'seed'

After running my stm several times successfully, I now get this error message every time I try to run it: UNRELIABLE VALUE: Future (‘’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random…
LouisD4
  • 21
  • 2
2
votes
1 answer

How do i measure perplexity scores on a LDA model made with the textmineR package in R?

I've made a LDA topic model in R, using the textmineR package, it looks as follows. ## get textmineR dtm dtm2 <- CreateDtm(doc_vec = dat2$fulltext, # character vector of documents ngram_window = c(1, 2), doc_names…
2
votes
1 answer

Why are LDA predictions incorrect

Step 1 I'm using R and the "topicmodels" package to build a LDA model from a 4.5k documents corpus. I do the usual pre-processing steps (stopwords, cut low/high words frequencies, lemmatization) and end up with a 100 topics model that I'm happy…
JFB
  • 23
  • 3
2
votes
0 answers

R topicmodels package: How to identify the parameter of Beta (eta) when we do LDA?

I conducted a topic modeling (LDA) using the R package topicmodels and successfully got a result. However, I am still not sure how I can set a key parameter, Beta (or eta), of LDA in this topicmodels package. I know we can set the Alpha parameter by…
TAK
  • 21
  • 2
2
votes
0 answers

Latent Dirichlet Allocation perplexity increases with number of topics k

I am working on an LDA in R and tried to evaluate the perplexity of my model for different values of topics k to get a feel for what a good value for perplexity is. However, I noticed that for increasing values of k the perplexity seems to go up…
teebs
  • 51
  • 8
2
votes
2 answers

R topicmodels LDA

I am running LDA on a small corpus of 2 docs (sentences) for testing purposes. Following code returns topic-term and document-topic distributions that are not reasonable at all given the input documents. Running exactly the same returns in Python…
schimo
  • 93
  • 1
  • 9
2
votes
1 answer

R LDA Topic Model How to get posterior for delta

I ran LDA using the R package topicmodels and I have been trying to get the value for delta which is, in my understanding, the parameter of the dirichlet for words over topics. However, I was not able to access the value. I only managed to get the…
lorenzbr
  • 161
  • 11
1
2 3 4 5 6 7