-1

I'd like to find an efficient way to use the df.groupby() function in pandas to return both the means and standard deviations of a data frame - preferably in one shot!

import pandas as PD

df = pd.DataFrame({'case':[1, 1, 2, 2, 3, 3],
                   'condition':[1,2,1,2,1,2],
                   'var_a':[0.92, 0.88, 0.90, 0.79, 0.94, 0.85],
                   'var_b':[0.21, 0.15, 0.1, 0.16, 0.17, 0.23]})

with that data, I'd like an easier way (if there is one!) to perform the following:

grp_means = df.groupby('case', as_index=False).mean()
grp_sems = df.groupby('case', as_index=False).sem()

grp_means.rename(columns={'var_a':'var_a_mean', 'var_b':'var_b_mean'},
                         inplace=True)
grp_sems.rename(columns={'var_a':'var_a_SEM', 'var_b':'var_b_SEM'},
                         inplace=True)

grouped = pd.concat([grp_means, grp_sems[['var_a_SEM', 'var_b_SEM']]], axis=1)

grouped
Out[1]: 
   case  condition  var_a_mean  var_b_mean  var_a_SEM  var_b_SEM
0     1        1.5       0.900        0.18      0.900       0.18
1     2        1.5       0.845        0.13      0.845       0.13
2     3        1.5       0.895        0.20      0.895       0.20

I also recently learned of the .agg() function, and tried df.groupby('grouper column') agg('var':'mean', 'var':sem') but this just returns a SyntaxError.

ayhan
  • 70,170
  • 20
  • 182
  • 203
fffrost
  • 1,659
  • 1
  • 21
  • 36
  • Combination of 2 dups: [Apply multiple functions to multiple groupby columns](https://stackoverflow.com/questions/14529838/apply-multiple-functions-to-multiple-groupby-columns); [Python Pandas - How to flatten a hierarchical index in columns](https://stackoverflow.com/questions/14507794/python-pandas-how-to-flatten-a-hierarchical-index-in-columns) – jpp Mar 27 '18 at 11:57
  • I think it is not combination only, because there is also remove column of MultiIndex. – jezrael Mar 27 '18 at 12:05
  • 2
    Then it's too broad.. If there are 3 clearly defined steps (if you call `reset_index` a step!), each with canonical answers with hundreds of upvotes. Then I think it's best to indicate this to OP. The reason is there are excellent alternatives for each step which OP should be aware of. I'm willing to ask other opinions on this one if you are happy? – jpp Mar 27 '18 at 12:10
  • 2
    Not really sure how this can be called a duplicate - the individual steps might be duplicates (I wasn't aware of what steps were involved btw), but then by that logic I suppose every question on SO should be marked as a duplicate of some sort? Either way, thanks for linking to the related questions. – fffrost Mar 27 '18 at 12:16
  • 4
    @fffrost, it's a balancing act. By another logic, every combination of every function ([mean, sum], [mean, mode], [mode, mean], etc.) is a separate question. What I usually expect from questions such as yours: "I have to do these 3 things, but I get stuck in step 2. How can I fix this?" What you miss with a single "correct" answer are all the excellent alternatives + edge cases in the dups, amassed over the years. – jpp Mar 27 '18 at 12:17
  • 1
    Possible duplicate of [Apply multiple functions to multiple groupby columns](https://stackoverflow.com/questions/14529838/apply-multiple-functions-to-multiple-groupby-columns) –  Apr 23 '18 at 02:31
  • To me, your main question seems like applying multiple functions to groupby objects. Your title also backs that up. In cases like this I think the best thing is to provide an answer/comment and close it as a duplicate. It doesn't mean you haven't searched for an answer. In fact, your question is very well posed and shows your effort. It is just more useful when you direct people to better/detailed answers. – ayhan Apr 23 '18 at 08:16

1 Answers1

1

I think need DataFrameGroupBy.agg, but then remove column ('condition','sem') and map for convert MultiIndex to columns:

df = df.groupby('case').agg(['mean','sem']).drop(('condition','sem'), axis=1)
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
   case  condition_mean  var_a_mean  var_a_sem  var_b_mean  var_b_sem
0     1             1.5       0.900      0.020        0.18       0.03
1     2             1.5       0.845      0.055        0.13       0.03
2     3             1.5       0.895      0.045        0.20       0.03
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252