So I've been stuck on this problem for daysss and I would appreciate it if someone helped me. I have a dataframe, and the columns are:
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PhraseId 93636 non-null int64
1 SentenceId 93636 non-null int64
2 Phrase 93636 non-null object
3 Sentiment 93636 non-null int64
The sentiment is from 0 to 4, which basically rated the Phrase from good to bad. I added two columns which might be of help: Number of words for each phrase, and split each phrase into a list, the list containing the words inside the phrase.
What I want to do is create 4 bar graphs (a bar graph for each sentiment) showing the top 15 most repeated words for that sentiment. The x axist would be the top 15 words repeated in that sentiment.
Below, I pasted a code that I wrote which counts how many times a word is repeated for each sentiment. That would probably be needed for the bar graph.
Sample data:
PhraseId SentenceId Phrase Sentiment SplitPhrase NumOfWords
44723 75358 3866 Build some robots... 0 [Build, some, robots...] 52
To count how many times a word is repeated for each sentiment:
counters = {}
for Sentiment in train_data['Sentiment'].unique():
counters[Sentiment] = Counter()
indices = (train_data['Sentiment'] == Sentiment)
for Phrase in train_data['SplitPhrase'][indices]:
counters[Sentiment].update(Phrase)
print(counters)
Sample output:
{2: Counter({'the': 28041, ',': 25046, 'a': 19962, 'of': 19376, 'and': 19052, 'to': 13470, '.': 10505, "'s": 10290, 'in': 8108, 'is': 8012, 'that': 7276, 'it': 6176, 'as': 5027, 'with': 4474, 'for': 4362, 'its': 4159, 'film': 3933......}),
3: Counter({'the': 28041, ',': 25046, 'a': 19962,.....