I have a pandas dataframe that contains a column 'reviews' and a column 'year'. I would like to view the top 100 most frequently occurring words in the reviews column, but filtered by year. So, I want to know the top 100 from 2002, 2003, 2004, and so on, all the way up to 2017.
import pandas as pd
from nltk.corpus import stopwords
df=pd.read_csv('./reviews.csv')
stop = stopwords.words('english')
commonwords = pd.Series(' '.join(df['reviews']).lower().split()).value_counts()[:100]
print(commonwords)
df.to_csv('commonwords.csv', index=False)
The above code works but it only gives me the top 100 most frequently occurring words across all years.