0

I have dataframe which looks as this:

FIRST      SECOND
 1           a
 1           b
 1           c
 1           b
 2           a
 2           k
 3           r
 3           r
 3           r

And I need to get matrix as this, which represent count of repetition of each word for every number:

FIRST    a   b   c  k    r
  1      1   2   1  0    0
  2      1   0   0  1    0
  3      0   0   0  0    3

Can anyone help me with this? :)

rafaelc
  • 57,686
  • 15
  • 58
  • 82
Neven
  • 453
  • 2
  • 7
  • 14

2 Answers2

1

This works:

pd.concat([df.FIRST, pd.get_dummies(df.SECOND)],1).groupby('FIRST').sum()

hacker315
  • 1,996
  • 2
  • 13
  • 23
1

Use pivot_table with aggfunc='count'

pd.pivot_table(df, values     = 'SECOND', 
                   columns    = df['SECOND'], 
                   index      = df['FIRST'], 
                   aggfunc    ='count', 
                   fill_value = 0)

Outputs

SECOND  a   b   c   k   r
FIRST                   
1       1   2   1   0   0
2       1   0   0   1   0
3       0   0   0   0   3
rafaelc
  • 57,686
  • 15
  • 58
  • 82