0

Below is the output that I get using Gensim Mallet wrapper. From this SO link I understood that LL/token means "model's log-liklihood divided by the total number of tokens". 1) However, for few topics like (1,8,11 etc.) I do not see any terms at all. 2) I tried to run the code for a range of topics from (10,20,2) (step of 2 starting from 10-20). But the output shows 17 as the last topic generated. I am missing something here..

0       2.77778 watch 
1       2.77778 
2       2.77778 receive tape hope purchase 
3       2.77778 dvds wildlife pass yr interested 
4       2.77778 dvd version walk bored 
5       2.77778 volume courtyard trilogy 
6       2.77778 crazy picture minute 
7       2.77778 neighbor 
8       2.77778 
9       2.77778 buy mice trouble stay versus feeder 
10      2.77778 inside stir tv mine life bird wonderful year fascinated 
11      2.77778 
12      2.77778 
13      2.77778 recommend test real prefer greenery 
14      2.77778 age 
15      2.77778 funny triliogy play friend full minute 
16      2.77778 
17      2.77778 time tree 

<950> LL/token: -22.17456
<960> LL/token: -22.22132
<970> LL/token: -22.24897
<980> LL/token: -22.11585
<990> LL/token: -22.38062
Hackerds
  • 1,195
  • 2
  • 16
  • 34

1 Answers1

0

This looks like the output you get when the input collection is much too small, or is divided into too few segments. "Documents" should be about 100-500 words, and there should be at least several hundred of them.

David Mimno
  • 1,836
  • 7
  • 7