1

Below is a gensim's example, but whenever I execute it, it show different result, so I couldn't believe gensim works well.

from gensim import corpora, models, similarities
from collections import defaultdict

documents = ["Human machine interface for lab abc computer applications",          # 0
             "A survey of user opinion of computer system response time",          # 1
             "The EPS user interface management system",                           # 2
             "System and human system engineering testing of EPS",                 # 3
             "Relation of user perceived response time to error measurement",      # 4
             "The generation of random binary unordered trees",                    # 5
             "The intersection graph of paths in trees",                           # 6
             "Graph minors IV Widths of trees and well quasi ordering",            # 7 
             "Graph minors A survey"]                                              # 8


stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

frequency = defaultdict(int)
for text in texts:
    for token in text:
        frequency[token] += 1
texts = [[token for token in text if frequency[token] > 1]
         for text in texts]

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
lda = models.LdaModel(corpus, id2word=dictionary, num_topics=2)
index = similarities.MatrixSimilarity(lda[corpus])


doc = "Human computer interaction"
vec_bow = dictionary.doc2bow(doc.lower().split())
vec_lda = lda[vec_bow]
sims = index[vec_lda]
sims = sorted(enumerate(sims), key=lambda item: -item[1])
print(sims)

print(lda.get_document_topics(vec_bow))

result

[(0, 0.9986434), (4, 0.99792993), (2, 0.99722278), (3, 0.99651831), (1, 0.99158639), (5, 0.53059661), (6, 0.4146674), (8, 0.38019019), (7, 0.36143348)] [(0, 0.18366596), (1, 0.81633401)]

[(1, 0.999605), (4, 0.9991864), (0, 0.998689), (5, 0.62957084), (6, 0.48837978), (8, 0.48152202), (3, 0.4541581), (7, 0.41751832), (2, 0.40637407)] [(0, 0.80285221), (1, 0.19714773)]

[(7, 0.99957085), (8, 0.99660784), (0, 0.99202132), (5, 0.78449017), (6, 0.77530348), (2, 0.56972337), (3, 0.47117239), (4, 0.47092015), (1, 0.4172135)] [(0, 0.25292286), (1, 0.74707717)]

Result 7 doesn't look simiar with "Human computer interaction" at all. Thanks.

semenbari
  • 725
  • 1
  • 8
  • 22
  • 2
    Possible duplicate of [LDA model generates different topics everytime i train on the same corpus](https://stackoverflow.com/questions/15067734/lda-model-generates-different-topics-everytime-i-train-on-the-same-corpus) – polm23 Jun 14 '18 at 06:53

0 Answers0