1

It's known that the LDA topic modeling learns two matrices of probabilities from data, one is a k x V matrix about P(w|z) values, and the other is a D x k matrix about P(z|d), where k is the number of topics, V is the vocabulary size, and D is the size of training documents.

After reading a former question, I learned that the methods mentioned in the paper are all quite difficult. However, a simple method under the assumption of independence like Naive Bayes can be quickly derived as follows, and the probabilities are all known after training.

p(zi | w1, ..., wn) ∝ p(w1, ..., wn | zi) * p(zi) = (Π p(wj | zi))*p(zi) for 1 <= i <= k ---(1)

p(zi) ∝ Σp(zi | dj) for 1 <= j <= D (under the assumption that all p(dj) are equal ) ---(2)


(a). Are there some errors or problematic assumptions in this derivation?

(b). Are there any papers on the Internet that have discussed about the performance of a similar simple method like this compared to other rigorous ones based on importance sampling, left-to-right estimator, etc.?

Community
  • 1
  • 1
Tom
  • 3,168
  • 5
  • 27
  • 36

0 Answers0