It's known that the LDA topic modeling learns two matrices of probabilities from data, one is a k x V matrix about P(w|z) values, and the other is a D x k matrix about P(z|d), where k is the number of topics, V is the vocabulary size, and D is the size of training documents.
After reading a former question, I learned that the methods mentioned in the paper are all quite difficult. However, a simple method under the assumption of independence like Naive Bayes can be quickly derived as follows, and the probabilities are all known after training.
p(zi | w1, ..., wn) ∝ p(w1, ..., wn | zi) * p(zi) = (Π p(wj | zi))*p(zi) for 1 <= i <= k ---(1)
p(zi) ∝ Σp(zi | dj) for 1 <= j <= D (under the assumption that all p(dj) are equal ) ---(2)
(a). Are there some errors or problematic assumptions in this derivation?
(b). Are there any papers on the Internet that have discussed about the performance of a similar simple method like this compared to other rigorous ones based on importance sampling, left-to-right estimator, etc.?