Pandas Apply groupby function to every column efficiently

Question

In pandas you can apply some groupby functions to every column in a dataframe such as in the case of:

pt=df.groupby(['Group']).sum().reset_index()

Lets say I want to apply a lambda function lambda x: (0 < x).sum() to count cells with a value in them and then include a count of the total items in each group. Is there more efficient way to apply this to all columns other than repeating this code:

import pandas as pd

df=pd.DataFrame({'Group':['W', 'W', 'W', 'E','E','E','N'],
'A':[0,1,5,0,1,5,7],
'B':[1,0,5,0,0,2,0],
'C':[1,1,5,0,0,5,0],
'Total':[2,2,15,0,1,12,7]
})

#Check how many items are present in Group
grp=df.groupby(['Group'])
pt1 = grp['A'].apply(lambda x: (0 < x).sum()).reset_index()
pt2 = grp['B'].apply(lambda x: (0 < x).sum()).reset_index()
pt3 = grp['C'].apply(lambda x: (0 < x).sum()).reset_index()

pct=pd.merge(pt1, pt2, on=['Group'])
pct=pd.merge(pt2, pct, on=['Group'])

#Get total items and merge with counts
pt = df.groupby(['Group'])['Total'].count().reset_index()
pct=pd.merge(pt, pct, on=['Group'])

Output:

  Group  Total  C  A  B
0     E      3  1  2  1
1     N      1  0  1  0
2     W      3  3  2  2

What is a efficient way to write it for n columns?

I'm not sure if I understood correctly. I cannot execute the last part as I don't know what 'Total' column represents but `grp[['A', 'B', 'C']].apply(lambda x: (0 < x).sum())` applies the same function to all three columns. Is this what you are asking? — ayhan, Sep 12 '16 at 19:39
@ayhan kind of but for the entire dataframe not just A, B, C is there a way. I tried `pct=df.groupby(['Group']).apply(lambda x: (0 < x).sum()).reset_index()` but it does not work — ccsv, Sep 12 '16 at 19:42
There may be. But you need to provide a sample dataframe and your expected output. When you say it doesn't work I don't know what went wrong so if you give an example of your expected output, it would be easier to help you. — ayhan, Sep 12 '16 at 19:44

score 3 · Accepted Answer · answered Sep 12 '16 at 20:10

3

The cleanest way I can think of is this:

(df > 0).groupby(df['Group']).agg({'A': 'sum', 'B': 'sum', 'C': 'sum', 'Total': 'count'})
Out: 
         C  Total    B    A
Group                      
E      1.0      3  1.0  2.0
N      0.0      1  0.0  1.0
W      3.0      3  2.0  2.0

You can sort and cast to int if you want:

((df > 0).groupby(df['Group']).agg({'A': 'sum', 'B': 'sum', 'C': 'sum', 'Total': 'count'})
                              .sort_index(axis=1).astype('int')
Out: 
       A  B  C  Total
Group                
E      2  1  1      3
N      1  0  0      1
W      2  2  3      3

answered Sep 12 '16 at 20:10

ayhan

70,170
20
182
203

1

do I have to define each column name or is there a way to apply to the whole dataframe? (eg. if I do `pt=df.groupby(['Group']).sum().reset_index()` ) I can get the results without defining the column names. – ccsv Sep 12 '16 at 20:20
1

Since you are applying different functions to different columns, I don't think that's possible (other than the way I suggested in the comment). – ayhan Sep 12 '16 at 20:23

Pandas Apply groupby function to every column efficiently

1 Answers1