I've been unable to create reproducible results from topicmodels' LDA function. To take an example from their documentation:
library(topicmodels)
set.seed(0)
lda1 <- LDA(AssociatedPress[1:20, ], control=list(seed=0), k=2)
set.seed(0)
lda2 <- LDA(AssociatedPress[1:20, ], control=list(seed=0), k=2)
identical(lda1, lda2)
# [1] FALSE
How can I get identical results from two separate calls to LDA?
As an aside (in case the package authors are on here), I find the control=list(seed=0)
snippet unfortunate and unnecessary. Behind the scenes, there's a line for if (missing(seed)) seed <- as.integer(Sys.time())
. This doesn't make the process more reliably random, it only undoes a specified seed. Am I missing something?
UPDATE: As @hrbrmstr discovered below, passing a seed as a control results in effectively identical objects, with the only difference being a temp local file location. So this question is more of a misunderstanding (though still seems like it would be clearer if the function respected set.seed()
).