Latent Dirichlet Allocation perplexity increases with number of topics k

Question

I am working on an LDA in R and tried to evaluate the perplexity of my model for different values of topics k to get a feel for what a good value for perplexity is. However, I noticed that for increasing values of k the perplexity seems to go up (which, I believe, it shouldn't). I was able to recreate this issue with the AssociatedPress {topicmodels} dataset. Here's the code:

data("AssociatedPress")
splitter_AP <- sample(1:nrow(AssociatedPress), (nrow(AssociatedPress))*0.25)
train_set_AP <- AssociatedPress[-splitter_AP, ]
valid_set_AP <- AssociatedPress[splitter_AP, ]

#Set parameters for Gibbs sampling
burnin <- 1000
iter <- 2000
seed <-list(2003,5,63,100001,765)
nstart <- 5
best <- TRUE
verbose <- 100

# Run LDA (I repeated the next step using values 10, 20 and 30 for k in this example)
ldaOut_AP10 <-LDA(train_set_AP,10, method="Gibbs", control=list(nstart=nstart, 
                                                                seed = seed, 
                                                                best = best, 
                                                                burnin = burnin, 
                                                                iter = iter, 
                                                               verbose=verbose))

perplexity(ldaOut_AP10, newdata=valid_set_AP, estimate_theta=FALSE) # returned 5544.164
perplexity(ldaOut_AP20, newdata=valid_set_AP, estimate_theta=FALSE) # returned 5755.367
perplexity(ldaOut_AP30, newdata=valid_set_AP, estimate_theta=FALSE) # returned 5808.529

This post really nicely shows that the perplexity should be going down not up. I just can't see where I'm going wrong. Would really appreciate any help!

Latent Dirichlet Allocation perplexity increases with number of topics k

0 Answers0