I'd like to find an efficient way to use the df.groupby()
function in pandas to return both the means and standard deviations of a data frame - preferably in one shot!
import pandas as PD
df = pd.DataFrame({'case':[1, 1, 2, 2, 3, 3],
'condition':[1,2,1,2,1,2],
'var_a':[0.92, 0.88, 0.90, 0.79, 0.94, 0.85],
'var_b':[0.21, 0.15, 0.1, 0.16, 0.17, 0.23]})
with that data, I'd like an easier way (if there is one!) to perform the following:
grp_means = df.groupby('case', as_index=False).mean()
grp_sems = df.groupby('case', as_index=False).sem()
grp_means.rename(columns={'var_a':'var_a_mean', 'var_b':'var_b_mean'},
inplace=True)
grp_sems.rename(columns={'var_a':'var_a_SEM', 'var_b':'var_b_SEM'},
inplace=True)
grouped = pd.concat([grp_means, grp_sems[['var_a_SEM', 'var_b_SEM']]], axis=1)
grouped
Out[1]:
case condition var_a_mean var_b_mean var_a_SEM var_b_SEM
0 1 1.5 0.900 0.18 0.900 0.18
1 2 1.5 0.845 0.13 0.845 0.13
2 3 1.5 0.895 0.20 0.895 0.20
I also recently learned of the .agg()
function, and tried df.groupby('grouper column') agg('var':'mean', 'var':sem')
but this just returns a SyntaxError.