I have 13 different lists of words. As I am doing topic modelling, I want to clean them, create corpus, get_document_topics and concatenate the results of all the lists. The code for doing the process over one list i.e. eastern_data_words is shown below. I want to apply these same steps to all the remaining 12 lists. I believe I should create a dictionary of these lists and then somehow loop over
Remove Stop Words
eastern_data_words_nostops = remove_stopwords(eastern_data_words)
Form Bigrams
eastern_data_words_bigrams = make_bigrams(eastern_data_words_nostops)
nlp = spacy.load("en_core_web_md", disable=['parser', 'ner'])
Do lemmatization keeping only noun, adj, vb, adv
eastern_data_lemmatized = lemmatization(eastern_data_words_bigrams, allowed_postags=['NOUN', 'ADJ', 'VERB', 'ADV'])
Create Dictionary
id2word_reg = corpora.Dictionary(eastern_data_lemmatized)
Create Corpus
texts_reg = eastern_data_lemmatized
Term Document Frequency
corpus_reg = [id2word_reg.doc2bow(text) for text in texts_reg]
#Getting weights of topics in all the documents
topics = [lda_model_tuned[corpus_reg[i]] for i in range(len(eastern))]
def topics_document_to_dataframe(topics_document, num_topics):
res = pd.DataFrame(columns=range(num_topics))
for topic_weight in topics_document:
res.loc[0, topic_weight[0]] = topic_weight[1]
return res
document_topic = pd.concat([topics_document_to_dataframe(topics_document, num_topics=8) for `topics_document in topics]).reset_index(drop=True).fillna(0)`
eastern_weights= document_topic.apply(np.mean, axis=0)
At the end i want a dataframe with the weights of different topics as columns and the list names as rows. Example of one column is shown in the image. output