Seaborn error while visualizing bag of words

Asked Jul 12 '18 at 08:13

Active Jul 12 '18 at 19:36

Viewed 194 times

I was given this tip in this SO question I asked:

Now that you have your matrix representation (rows are the products, columns are the counts for each unique word), you can filter the matrix down to the most common words. I would encourage you to take a look at how the distribution of word counts looks. We will use seaborn for that and import it like so:

import seaborn as sns

Given that your pd.DataFrame holding the word-count matrix is called df, sns.distplot(df.sum()) should do the trick. Choose some cutoff that seems like it preserves a good chunk of the counts but doesn't include many words with low counts. It can be arbitrary and it doesn't really matter for now. Your word count matrix is your input data, or also called the predictor variable. In machine learning this is often called the input matrix or vector X.

I managed to do the bag of words (BOG) for every column. The code is as follows:

df['BOW'] = df.Review2.str.split().apply(Counter)

But when I try to visualize as suggested (sns.distplot(df['BOW'].sum())) I get the following error:

unsupported operand type(s) for /: 'Counter' and 'int'

Thx for reading the post and have a good day :)

edited Jul 12 '18 at 19:36

asked Jul 12 '18 at 08:13

Stefano Pozzi

What does `df.Review2` contain. Could you provide an example, see [mcve]. – ImportanceOfBeingErnest Jul 12 '18 at 10:58
@ImportanceOfBeingErnest I added an edit to it – Stefano Pozzi Jul 12 '18 at 19:28

Seaborn error while visualizing bag of words

0 Answers0