I have two dataframes: tr
is a training-set, ts
is a test-set.
They contain columns uid
(a user_id), categ
(a categorical), and response
.
response
is the dependent variable I'm trying to predict in ts.
I am trying to compute the mean of response
in tr
, broken out by columns uid
and categ
:
avg_response_uid_categ = tr.groupby(['uid','categ']).response.mean()
This gives the result but (unwantedly) the dataframe index is a MultiIndex. (this is the groupby(..., as_index=True)
behavior):
MultiIndex[--5hzxWLz5ozIg6OMo6tpQ SomeValueOfCateg, --65q1FpAL_UQtVZ2PTGew AnotherValueofCateg, ...
But instead I want the result to keep the two columns 'uid', 'categ' and keep them separate.
Should I use aggregate()
instead of groupby()
?
Trying groupby(as_index=False)
is useless.