0

I want to extract the most common words in a DataFrame with [Word, Count] as columns.

The results should be similar to the one presented by WordCloud, as you can see only relevant words are considered. So no stopwords, multiple words for item, and capitalization maintained.

WordCloud Example image

I've tried using Counter but the result consider only single word and stopwords are still presents.

x = Counter(' '.join(df['name']).split()).most_common(20)
pd.DataFrame(x, columns=['word', 'count'])

    word    count
0   in      8875
1   Private 3224
2   Room    2925
3   to      2645
4   room    2512
5   Bedroom 2404
6   Cozy    2324
7   2       2255
8   Brooklyn    2099
9   Apartment   2075
10  &       1966
12  Manhattan   1824
11  1       1885
13  with    1815
14  and     1714
15  of      1703
16  the     1700
17  Studio  1638
18  bedroom 1615
19  -       1567
karamon14
  • 41
  • 1
  • 5

0 Answers0