I have a structured dataset with columns 'text' and 'topic'. Someone has already conducted a word embedding/topic modeling so each row in 'text' is assigned a topic number (1-200). I would like to create a new data frame with the topic number and the top 5-10 key words that represent that topic.
I've done this before, but I usually start from scratch and run an LDA model. Then use the objects created by the LDA to find keywords per topic. That said, I'm starting from a mid-point that my supervisor gave me, and it's throwing me off.
The data structure looks like below:
import pandas as pd
df = pd.DataFrame({'text': ['foo bar baz', 'blah bling', 'foo'],
'topic': [1, 2, 1]})
So would the plan be to create a bag of words, groupby 'topic,' and count the words? Or is there a keywords function and group by a column option that I don't know about in gensim or nltk?