I want to extract the most common words in a DataFrame with [Word, Count]
as columns.
The results should be similar to the one presented by WordCloud
, as you can see only relevant words are considered. So no stopwords, multiple words for item, and capitalization maintained.
I've tried using Counter but the result consider only single word and stopwords are still presents.
x = Counter(' '.join(df['name']).split()).most_common(20)
pd.DataFrame(x, columns=['word', 'count'])
word count
0 in 8875
1 Private 3224
2 Room 2925
3 to 2645
4 room 2512
5 Bedroom 2404
6 Cozy 2324
7 2 2255
8 Brooklyn 2099
9 Apartment 2075
10 & 1966
12 Manhattan 1824
11 1 1885
13 with 1815
14 and 1714
15 of 1703
16 the 1700
17 Studio 1638
18 bedroom 1615
19 - 1567