4

I would like to count number of yes and no values by Column and groupby index.

I have this dataframe :

col0  col1 col2
A     yes  no
A     no   no
B     yes  yes
B     yes  no

I want this:

   col1     col2
   yes  no  yes  no
A  1    1   0    2
B  2    0   1    1

I tried with df.pivot_table(index='my_index', aggfunc='count') but i only got

   col1     col2

A  2        2
B  2        2
cs95
  • 379,657
  • 97
  • 704
  • 746
ambigus9
  • 1,417
  • 3
  • 19
  • 37
  • @Wen, I've tried a few pivot solutions, but they don't seem to work. Any ideas? – cs95 Mar 30 '18 at 17:58
  • @Wen Hmm, check that output, it seems different? – cs95 Mar 30 '18 at 18:00
  • @cᴏʟᴅsᴘᴇᴇᴅ yep `pd.concat([pd.crosstab(df.col0,[df.col1.astype('category')]),pd.crosstab(df.col0,[df.col2.astype('category')])],axis=1,keys=['col1','col2'])` – BENY Mar 30 '18 at 18:01
  • @Wen Ah, it's definitely more complicated than a simple pivot problem... I'll reopen this question, so please post that as the answer ;) – cs95 Mar 30 '18 at 18:02
  • @cᴏʟᴅsᴘᴇᴇᴅ Yep , I think so , we should reopen it – BENY Mar 30 '18 at 18:03

1 Answers1

3

Option 1
pd.get_dummies + groupby + sum

v = pd.get_dummies(df.set_index('col0'))

v.columns = pd.MultiIndex.from_tuples(
    list(map(tuple, v.columns.str.split('_')))
)
v.sum(level=0)

     col1     col2    
       no yes   no yes
col0                  
A       1   1    2   0
B       0   2    1   1

Option 2
stack + get_dummies + unstack

(df.set_index('col0')
   .stack()
   .str.get_dummies()
   .sum(level=[0,1])
   .unstack(-1)
   .swaplevel(0, 1, axis=1)
   .sort_index(level=0, axis=1)
)

     col1     col2    
       no yes   no yes
col0                  
A       1   1    2   0
B       0   2    1   1

Option 3
crosstab + concat by @Wen

i = pd.crosstab(df.col0, df.col1.astype('category'))
j = pd.crosstab(df.col0, df.col2.astype('category'))

pd.concat([i, j], axis=1, keys=['col1','col2'])

     col1     col2    
col1   no yes   no yes
col0                  
A       1   1    2   0
B       0   2    1   1
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Man just add it to your answer, without your reminder I almost killing a good question ..`pd.concat([pd.crosstab(df.col0,[df.col1.astype('category')]),pd.crosstab(df.col0,[df.col2.astype('category')])],axis=1,keys=['col1','col2'])` – BENY Mar 30 '18 at 18:08
  • @Wen I didn't see your answer, please post again so I can upvote ;) – cs95 Mar 30 '18 at 18:09
  • If you do not mind man, could you please adding to your answer ? feel embarrassed to post as an answer...:-( – BENY Mar 30 '18 at 18:12
  • @Wen It is a good answer, shame you did not post it yourself. I've added it, cheers – cs95 Mar 30 '18 at 18:14
  • @cᴏʟᴅsᴘᴇᴇᴅ thanks! It works perfectly! Should I change the title of Question?, Wich is a good tittle for this problem? – ambigus9 Mar 30 '18 at 18:15
  • @Ambigus9 You're welcome. As for the title, I fixed it for you. – cs95 Mar 30 '18 at 18:17
  • @cᴏʟᴅsᴘᴇᴇᴅ How can I get the percentage of yes column value for each column? – ambigus9 Mar 30 '18 at 18:39
  • @Ambigus9 Hmm, I think it would be a simple extension of this code, but can you open a new post with expected output? It'll be easier to visualise. – cs95 Mar 30 '18 at 18:57