0

I'm doing some topic modeling and am looking to store some of the results of my analysis.

import pandas as pd, numpy as np, scipy
import sklearn.feature_extraction.text as text
from sklearn import decomposition

descs = ["You should not go there", "We may go home later", "Why should we do your chores", "What should we do"]

vectorizer = text.CountVectorizer()

dtm = vectorizer.fit_transform(descs).toarray()

vocab = np.array(vectorizer.get_feature_names())

nmf = decomposition.NMF(3, random_state = 1)

topic = nmf.fit_transform(dtm)

topic_words = []

for t in nmf.components_:
    word_idx = np.argsort(t)[::-1][:20]
    topic_words.append(vocab[i] for i in word_idx)

for t in range(len(topic_words)):
    print("Topic {}: {}\n".format(t, " ".join([word for word in topic_words[t]])))

Prints:

Topic 0: do we should your why chores what you there not may later home go

Topic 1: should you there not go what do we your why may later home chores

Topic 2: we may later home go what do should your you why there not chores

I'm trying to write those topics to a file, so I thought storing them in a list might work, like this:

l = []
for t in range(len(topic_words)):
    l.append([word for word in topic_words[t]])
    print("Topic {}: {}\n".format(t, " ".join([word for word in topic_words[t]])))

But l just ends up as an empty array. How can I store these words in a list?

blacksite
  • 12,086
  • 10
  • 64
  • 109

1 Answers1

3

You're appending generator expressions to your list topic_words, so the first time you printed, the generator expressions are already exhausted. You can instead do:

topic_words = []

for t in nmf.components_:
    word_idx = np.argsort(t)[::-1][:20]
    topic_words.append([vocab[i] for i in word_idx])
#                      ^                          ^

With this, you apparently won't need a new list, and you can print out with:

for t, words in enumerate(topic_words, 1):
    print("Topic {}: {}\n".format(t, " ".join(words)))
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139