2

Here is a reference, I've already found for doing a similar operation but not exact.

What I have is:
The Dataframe in foll. format:

    Tweets                                                   Classified     FreqWord
     calm director day science meetings nasal talk cutting edge remote sensing research drought veg fluorescence calm love                 Positive drought
     love thought drought   Positive    drought
     reign mother kerr funny none tried make come back drought  Positive    drought
     wonder could help thai market b post reuters drought devastates south europe crops Negative    drought
     wonder could help thai market b post reuters drought devastates south europe crops Negative    crops
     wonder could help thai market b post reuters drought devastates south europe crops Negative    crops
     wonder could help thai market b post reuters drought devastates south europe crops Negative    business
     every child safe drinking water thank uk aid providing suppo ensure children rights drought    Positive    drought
     every child safe drinking water thank uk aid providing suppo ensure children rights drought    Positive    water

Dataframe

What I need is:
The Dataframe in Pivot table where Index is Classified, Columns is FreqWord and Values needs to be number of occurences Tweets, Classified in that Frequent word. In short, something like foll.

Classified  drought crops   business    water
Positive        5       0          0        1
Negative        1       2          1        0

Also note
I have more number of 'Frequent Words' and 'Classified' for this dataset

T3J45
  • 717
  • 3
  • 12
  • 32

1 Answers1

2

You can do it this way:

pd.crosstab(df.Classified, df.FreqWord)

Output

FreqWord    business  crops  drought  water
Classified                                 
Negative           1      2        1      0
Positive           0      0        4      1

Or get_dummies:

df_out = pd.get_dummies(df[['Classified','FreqWord']], columns=['FreqWord'])\
           .set_index('Classified').sum(level=0)
df_out.columns = df_out.columns.str.split('_').str[1]

Output:

            business  crops  drought  water
Classified                                 
Positive           0      0        4      1
Negative           1      2        1      0

And, if you wish you can reset_index:

df_out.reset_index()

  Classified  business  crops  drought  water
0   Positive         0      0        4      1
1   Negative         1      2        1      0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187