1

I have a dataframe which looks like this -

         group     outcome_30   outcome_365
307     control     no event    no event
141     treatment   no event    no event
291     control     no event    no event
138     treatment   no event    no event
320     control     no event    no event
189     treatment   no event    no event
16      treatment   stroke       stroke
227     control     stroke       stroke
10      treatment   stroke       stroke
257     control     no event    no event

I tried to create a pivot table using this code -

stent_pt = stent.pivot_table(index='group', values=['outcome_30','outcome_365'], aggfunc='count')
stent_pt

       outcome_30 outcome_365
group       
control     227     227
treatment   224     224

But this is not what I am looking for. I want to do the count by (no event, stroke) for the outcome30 and also for the outcome365.

This is how I want my result to look like -

enter image description here

bhola prasad
  • 675
  • 7
  • 22

1 Answers1

1

Use DataFrame.melt with pivoting by aggregate function GroupBy.size and columns parameter:

df = (stent.melt('group')
           .pivot_table(index='group', columns=['variable','value'], aggfunc='size'))
df.loc['Total'] = df.sum()
print (df)
variable  outcome_30        outcome_365       
value       no event stroke    no event stroke
group                                         
control            4      1           4      1
treatment          3      2           3      2
Total              7      3           7      3

Altenative solution with crosstab:

df = stent.melt('group')
df = (pd.crosstab(df['group'], 
                  [df['variable'], df['value']], 
                   margins=True, 
                   margins_name='Total')
        .drop('Total', axis=1, level=0))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • just want to ask one more question. What is the difference between count and size ? I tried to change the size to count in aggfunc but getting an empty result. – bhola prasad Apr 03 '20 at 13:24
  • @bholaprasad - `count` is used for number of values exclude misisng values, `size` for all values count, so not necessary specify values parameter, [link](https://stackoverflow.com/questions/33346591/what-is-the-difference-between-size-and-count-in-pandas) – jezrael Apr 03 '20 at 13:27