I have a dataset that contains about 17000 of user data scraped from twitter and I am working with the latent dirichlet allocation algorithm. I want to split my dataset but I am not sure what is the best way. What are the criteria to split a dataset when it comes to train a LDA model. I am using gensim to train LDA model. Thank you
Asked
Active
Viewed 852 times
0
-
It is tricky with unsupervised learning. Some good info in this question https://stackoverflow.com/questions/11162402/lda-topic-modeling-training-and-testing – alex9311 Jan 12 '21 at 16:20
-
Thank you for the recommendation, it's useful – hajar hajar Jan 12 '21 at 17:25