In pandas
you can apply some groupby functions to every column in a dataframe such as in the case of:
pt=df.groupby(['Group']).sum().reset_index()
Lets say I want to apply a lambda function lambda x: (0 < x).sum()
to count cells with a value in them and then include a count of the total items in each group. Is there more efficient way to apply this to all columns other than repeating this code:
import pandas as pd
df=pd.DataFrame({'Group':['W', 'W', 'W', 'E','E','E','N'],
'A':[0,1,5,0,1,5,7],
'B':[1,0,5,0,0,2,0],
'C':[1,1,5,0,0,5,0],
'Total':[2,2,15,0,1,12,7]
})
#Check how many items are present in Group
grp=df.groupby(['Group'])
pt1 = grp['A'].apply(lambda x: (0 < x).sum()).reset_index()
pt2 = grp['B'].apply(lambda x: (0 < x).sum()).reset_index()
pt3 = grp['C'].apply(lambda x: (0 < x).sum()).reset_index()
pct=pd.merge(pt1, pt2, on=['Group'])
pct=pd.merge(pt2, pct, on=['Group'])
#Get total items and merge with counts
pt = df.groupby(['Group'])['Total'].count().reset_index()
pct=pd.merge(pt, pct, on=['Group'])
Output:
Group Total C A B
0 E 3 1 2 1
1 N 1 0 1 0
2 W 3 3 2 2
What is a efficient way to write it for n columns?