Extract most common words cleaned as WordCloud?

Asked Dec 20 '19 at 18:15

Active Dec 20 '19 at 18:15

Viewed 367 times

I want to extract the most common words in a DataFrame with [Word, Count] as columns.

The results should be similar to the one presented by WordCloud, as you can see only relevant words are considered. So no stopwords, multiple words for item, and capitalization maintained.

WordCloud Example image

I've tried using Counter but the result consider only single word and stopwords are still presents.

x = Counter(' '.join(df['name']).split()).most_common(20)
pd.DataFrame(x, columns=['word', 'count'])

    word    count
0   in      8875
1   Private 3224
2   Room    2925
3   to      2645
4   room    2512
5   Bedroom 2404
6   Cozy    2324
7   2       2255
8   Brooklyn    2099
9   Apartment   2075
10  &       1966
12  Manhattan   1824
11  1       1885
13  with    1815
14  and     1714
15  of      1703
16  the     1700
17  Studio  1638
18  bedroom 1615
19  -       1567

asked Dec 20 '19 at 18:15

karamon14

Like a Top 10?.. – GiovaniSalazar Dec 20 '19 at 18:23
you can use `ntlk` to remove stopwords, for example with something like in this example https://stackoverflow.com/a/46977475/10035985 – Andrej Kesely Dec 20 '19 at 18:31

Extract most common words cleaned as WordCloud?

0 Answers0