Accessing columns with MultiIndex after using pandas groupby and aggregate

Question

I am using the df.groupby() method:

g1 = df[['md', 'agd', 'hgd']].groupby(['md']).agg(['mean', 'count', 'std'])

It produces exactly what I want!

         agd                       hgd                
        mean count       std      mean count       std
md                                                    
-4  1.398350     2  0.456494 -0.418442     2  0.774611
-3 -0.281814    10  1.314223 -0.317675    10  1.161368
-2 -0.341940    38  0.882749  0.136395    38  1.240308
-1 -0.137268   125  1.162081 -0.103710   125  1.208362
 0 -0.018731   603  1.108109 -0.059108   603  1.252989
 1 -0.034113   178  1.128363 -0.042781   178  1.197477
 2  0.118068    43  1.107974  0.383795    43  1.225388
 3  0.452802    18  0.805491 -0.335087    18  1.120520
 4  0.304824     1       NaN -1.052011     1       NaN

However, I now want to access the groupby object columns like a "normal" dataframe.

I will then be able to: 1) calculate the errors on the agd and hgd means 2) make scatter plots on md (x axis) vs agd mean (hgd mean) with appropriate error bars added.

Is this possible? Perhaps by playing with the indexing?

Related: [Select rows in a pandas DataFrame which has a MultiIndex](https://stackoverflow.com/questions/53927460/select-rows-in-a-pandas-dataframe-which-has-a-multiindex) — smci, Nov 12 '19 at 21:38

benten · Accepted Answer · 2016-09-02T20:23:41.797

2

1) You can rename the columns and proceed as normal (will get rid of the multi-indexing)

g1.columns = ['agd_mean', 'agd_std','hgd_mean','hgd_std']

2) You can keep multi-indexing and use both levels in turn (docs)

g1['agd']['mean count']

edited Sep 02 '16 at 20:23

answered Sep 02 '16 at 20:13

benten

1,995
2
23
38

That helped! Thanks! But how can I plot md on the x axis? It won't let me rename the -4 - +4 md values :/ .... – Sam Gregson Sep 02 '16 at 20:18
1

If you rename the columns, md is your index and you can access its values by `g1.index.values`. Better still, you can use `g1.plot(...)` and it will use the index as your x values by default (depending on the kind of plot you're making). – benten Sep 02 '16 at 20:22
You can change the index values by g1.index = (list or array or whatever). – benten Sep 02 '16 at 20:25
Perfect! Thank you very much for taking the time to help me :) – Sam Gregson Sep 02 '16 at 20:29
Hmmm... g1.plot(x=g1.index.values, y=g1.hgd_mean, kind = "scatter") doesn't work after the renaming...thoughts? – Sam Gregson Sep 02 '16 at 20:52
Try this instead: `plt.scatter(g1.index,g1.agd_mean)` – benten Sep 03 '16 at 03:01
That got it! Thanks! – Sam Gregson Sep 04 '16 at 14:49

score 0 · Answer 2 · answered Sep 02 '16 at 22:15

0

It is possible to do what you are searching for and it is called transform. You will find an example that does exactly what you are searching for in the pandas documentation here.

answered Sep 02 '16 at 22:15

Zeugma

31,231
9
69
81

Accessing columns with MultiIndex after using pandas groupby and aggregate

2 Answers2