4

I am using the df.groupby() method:

g1 = df[['md', 'agd', 'hgd']].groupby(['md']).agg(['mean', 'count', 'std'])

It produces exactly what I want!

         agd                       hgd                
        mean count       std      mean count       std
md                                                    
-4  1.398350     2  0.456494 -0.418442     2  0.774611
-3 -0.281814    10  1.314223 -0.317675    10  1.161368
-2 -0.341940    38  0.882749  0.136395    38  1.240308
-1 -0.137268   125  1.162081 -0.103710   125  1.208362
 0 -0.018731   603  1.108109 -0.059108   603  1.252989
 1 -0.034113   178  1.128363 -0.042781   178  1.197477
 2  0.118068    43  1.107974  0.383795    43  1.225388
 3  0.452802    18  0.805491 -0.335087    18  1.120520
 4  0.304824     1       NaN -1.052011     1       NaN

However, I now want to access the groupby object columns like a "normal" dataframe.

I will then be able to: 1) calculate the errors on the agd and hgd means 2) make scatter plots on md (x axis) vs agd mean (hgd mean) with appropriate error bars added.

Is this possible? Perhaps by playing with the indexing?

smci
  • 32,567
  • 20
  • 113
  • 146
Sam Gregson
  • 159
  • 1
  • 14
  • Related: [Select rows in a pandas DataFrame which has a MultiIndex](https://stackoverflow.com/questions/53927460/select-rows-in-a-pandas-dataframe-which-has-a-multiindex) – smci Nov 12 '19 at 21:38

2 Answers2

2

1) You can rename the columns and proceed as normal (will get rid of the multi-indexing)

g1.columns = ['agd_mean', 'agd_std','hgd_mean','hgd_std']

2) You can keep multi-indexing and use both levels in turn (docs)

g1['agd']['mean count']
benten
  • 1,995
  • 2
  • 23
  • 38
0

It is possible to do what you are searching for and it is called transform. You will find an example that does exactly what you are searching for in the pandas documentation here.

Zeugma
  • 31,231
  • 9
  • 69
  • 81