I have the list with words. I would like to count and check the most common words.
['project',
'gutenberg',
'ebook',
'oliver',
'twist',
'may',......]
I have deleted stopwords from my list:
from nltk.corpus import stopwords
data2 = data.split()
for x in data2:
if x == "":
data2.remove("")
elif x in stopwords.words('english'):
data2.remove(x)
When I would like to see results. It's great but I would like to sort the words.
from collections import Counter
Counter(data2)
Counter({'project': 88,
'gutenberg': 98,
'ebook': 13,
'oliver': 881,
'twist': 68,
Why I get stopwords? How to solve that?
Counter(data2).most_common(10)
[('the', 4746),
('a', 1943),
('said', 1232),