I have multiple text columns. I want to use bag of words for each text column, then create a new bag of words dataframe for each text column. This is what I have:
text_df = [['text response', 'another response'], ['written responses', 'more text'], ['lots more text', 'text text']]
text_df = pd.DataFrame(text_df, columns = ['answer1', 'answer2'])
def bow (tokens, data):
tokens = tokens.apply(nltk.word_tokenize)
cvec = CountVectorizer(min_df=.01, ngram_range=(1,3), tokenizer=lambda doc:doc, lowercase=False)
cvec.fit(tokens)
cvec_counts = cvec.transform(tokens)
cvec_counts_bow = cvec_counts.toarray()
vocab = cvec.get_feature_names()
bow_model = pd.DataFrame(cvec_counts_bow, columns=vocab)
return bow_model
answers = ['answer1', 'answer2']
for a in answer_list:
a = bow(text_df[a], a)
I want 2 dataframes, one called answer1 and one called answer2, each with their own bag of words. But, I end up with one dataframe called "a" with only bag of words for answer2.
Any ideas how to fix this?