I'm working in R, package "topicmodels". I'm trying to work out and better understand the code/package. In most of the tutorials, documentation I'm reading I'm seeing people define topics by the 5 or 10 most probable terms. Here is an example:
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
lda <- LDA(AssociatedPress[1:20,], k = 5)
topics(lda)
terms(lda)
terms(lda,5)
so the last part of the code returns me the 5 most probable terms associated with the 5 topics I've defined.
In the lda object, i can access the gamma element, which contains per document the probablity of beloning to each topic. So based on this I can extract the topics with a probability greater than any threshold I prefer, instead of having for everyone the same number of topics.
But my second step would then to know which words are strongest associated to the topics. I can use the terms(lda) function to pull this out, but this gives me the N so many.
In the output I've also found the
lda@beta
which contains the beta per word per topic, but this is a Beta value, which I'm having a hard time interpreting. They are all negative values, and though I see some values around -6, and other around -200, i can't interpret this as a probability or a measure to see which words and how much stronger certain words associate to a topic. Is there a way to pull out/calculate anything that can be interpreted as such a measure.
many thanks Frederik