How to do pandas groupby([multiple columns]) so its result can be looked up

Question

I have two dataframes: tr is a training-set, ts is a test-set. They contain columns uid (a user_id), categ (a categorical), and response. response is the dependent variable I'm trying to predict in ts.

I am trying to compute the mean of response in tr, broken out by columns uid and categ:

avg_response_uid_categ = tr.groupby(['uid','categ']).response.mean()

This gives the result but (unwantedly) the dataframe index is a MultiIndex. (this is the groupby(..., as_index=True) behavior):

MultiIndex[--5hzxWLz5ozIg6OMo6tpQ  SomeValueOfCateg, --65q1FpAL_UQtVZ2PTGew  AnotherValueofCateg, ...

But instead I want the result to keep the two columns 'uid', 'categ' and keep them separate.

Should I use aggregate() instead of groupby()? Trying groupby(as_index=False) is useless.

smci · Accepted Answer · 2019-07-20T23:26:34.380

0

The result seems to differ depending on whether you do:

tr.groupby(['uid','categ']).response.mean()

or:

tr.groupby(['uid','categ'])['response'].mean()  # RIGHT

i.e. whether you slice a single Series, or a DataFrame containing a single Series. Related: Pandas selecting by label sometimes return Series, sometimes returns DataFrame

edited Jul 20 '19 at 23:26

answered Aug 04 '13 at 07:55

smci

32,567
20
113
146

How to do pandas groupby([multiple columns]) so its result can be looked up

1 Answers1