1

my df looks like this:

category       text_list
--------       ---------
soccer         [soccer, game, is, good, soccer, game]
basketball     [game, basketball, game]
volleyball     [sport ,volleyball, sport] 

What I want to do is groupby category and then list the words by its frequency

category       text_list          frequency
--------       ---------          ---------
soccer         soccer             2
               game               2 
               is                 1
               good               1
basketball     game               2
               basketball         1  
volleyball     sport              2
               volleyball         1

what did I do?

  • I am able to find the frequency per row but I am not able to label the way I wanted in a DataFrame

Could someone please help me? If possible using NLTK

floss
  • 2,603
  • 2
  • 20
  • 37

1 Answers1

2

Try explode then groupby:

(df.explode('text_list')
   .groupby(['category','text_list']).size()
   .to_frame(name='frequency')
)

Output:

                       frequency
category   text_list            
basketball basketball          1
           game                2
soccer     game                2
           good                1
           is                  1
           soccer              2
volleyball sport               2
           volleyball          1
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • thank you for composing the answer. however my `Pandas` installation is limited to `0.22.0` and `explode` is not supported – floss Jan 06 '21 at 20:56
  • @floss have a look at [this `unnest` function](https://stackoverflow.com/a/53218939/4238408) as a substitution. – Quang Hoang Jan 06 '21 at 20:58