Count the values in Pandas dataframe for 2 categories into Pivot table

Question

Here is a reference, I've already found for doing a similar operation but not exact.

What I have is:
The Dataframe in foll. format:

    Tweets                                                   Classified     FreqWord
     calm director day science meetings nasal talk cutting edge remote sensing research drought veg fluorescence calm love                 Positive drought
     love thought drought   Positive    drought
     reign mother kerr funny none tried make come back drought  Positive    drought
     wonder could help thai market b post reuters drought devastates south europe crops Negative    drought
     wonder could help thai market b post reuters drought devastates south europe crops Negative    crops
     wonder could help thai market b post reuters drought devastates south europe crops Negative    crops
     wonder could help thai market b post reuters drought devastates south europe crops Negative    business
     every child safe drinking water thank uk aid providing suppo ensure children rights drought    Positive    drought
     every child safe drinking water thank uk aid providing suppo ensure children rights drought    Positive    water

What I need is:
The Dataframe in Pivot table where Index is Classified, Columns is FreqWord and Values needs to be number of occurences Tweets, Classified in that Frequent word. In short, something like foll.

Classified  drought crops   business    water
Positive        5       0          0        1
Negative        1       2          1        0

Also note
I have more number of 'Frequent Words' and 'Classified' for this dataset

Scott Boston · Accepted Answer · 2018-04-16T12:53:35.793

2

You can do it this way:

pd.crosstab(df.Classified, df.FreqWord)

Output

FreqWord    business  crops  drought  water
Classified                                 
Negative           1      2        1      0
Positive           0      0        4      1

Or get_dummies:

df_out = pd.get_dummies(df[['Classified','FreqWord']], columns=['FreqWord'])\
           .set_index('Classified').sum(level=0)
df_out.columns = df_out.columns.str.split('_').str[1]

Output:

            business  crops  drought  water
Classified                                 
Positive           0      0        4      1
Negative           1      2        1      0

And, if you wish you can reset_index:

df_out.reset_index()

  Classified  business  crops  drought  water
0   Positive         0      0        4      1
1   Negative         1      2        1      0

edited Apr 16 '18 at 12:53

answered Apr 16 '18 at 12:52

Scott Boston

147,308
15
139
187

1

Awesome job @Scott ! That was pretty simple. I almost pulled my hair for this! – T3J45 Apr 16 '18 at 13:04
What a comprehensive answer! – MaxU - stand with Ukraine Apr 16 '18 at 13:48
@MaxU Thank you! :) – Scott Boston Apr 16 '18 at 14:18

Count the values in Pandas dataframe for 2 categories into Pivot table

1 Answers1