2

I have a pandas DataFrame with 2 grouping columns and 3 numeric columns. I am grouping the data like this:

df = df.groupby(['date_week', 'uniqeid']).agg({
    'completes':['sum', 'median', 'var', 'min', 'max']
    ,'dcount_visitors': ['sum', 'median', 'var', 'min', 'max']
     ,'dcount_visitor_groups': ['sum', 'median', 'var', 'min', 'max']
     })

The result is the expected multi-level index:

MultiIndex(levels=[['completes', 'dcount_visitors', 'dcount_subscriptions', 'dcount_visitor_groups', 'date_week'], ['sum', 'median', 'var', 'min', 'max', '']],
           labels=[[4, 3, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2], [5, 5, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]])

Usually I flatten a multi-index like this:

df2 = df2.reset_index(drop=True)

However, when I inspect the columns I still get a multi-index. I have tried including the as_index=False in my groupby function but that doesn't work either.

Interestingly, this process works as expected if I only use 1 numeric column with one aggregation.

u = nunits.groupby(['account', 'week_date', 'accountid', 'full_account_name','SegmentName'], as_index=False).agg({'ConsumptionUnit': 'sum'})

Index(['account', 'week_date', 'accountid', 'full_account_name', 'SegmentName',
       'ConsumptionUnit'],
      dtype='object')

Any tips or recommendations would be appreciated.

R_Queery
  • 497
  • 1
  • 9
  • 19
  • did you try `df2.columns = df2.columns.get_level_values(0)` ? – Erfan Apr 23 '19 at 22:20
  • Even with one value if you add the `[]` `agg({'ConsumptionUnit': ['sum']})`, it still multiple index in columns , that is how it works – BENY Apr 23 '19 at 22:24

2 Answers2

2

(realize it's a bit against the norm to "accept" your own question, but wanted to save folks time in responding to a question that was resolved)

@Efran: I did, and it was a 2 level multi-index. @Bugbeeb: Good call on identifying the level. The 5 on the labels was throwing me off.

I was able to hunt down an answer: as of Pandas 0.24.0 you can use .to_flat_index. I had been using 0.23.0 so wasn't finding this as option in the that documentation.

An example of how to use this can be found here

after: df.columns = df.columns.to_flat_index() The resulting index looks like this

Index([                                   'date_week',
                                               'TPID',
                              ('completes', 'sum'),
                           ('completes', 'median'),
                              ('completes', 'var'),
                              ('completes', 'min'),
                              ('completes', 'max'),
          ('dcount_visitors_with_events', 'sum'),
       ('dcount_visitors_with_events', 'median'),
          ('dcount_visitors_with_events', 'var'),
          ('dcount_visitors_with_events', 'min'),
          ('dcount_visitors_with_events', 'max'),
                      ('dcount_id_groups', 'sum'),
                   ('dcount_id_groups', 'median'),
                      ('dcount_id_groups', 'var'),
                      ('dcount_id_groups', 'min'),
                      ('dcount_id_groups', 'max')],
      dtype='object')

Hope this helps other folks and thanks for quick replies. This community is great!

R_Queery
  • 497
  • 1
  • 9
  • 19
1

you need to identify the level in a multi_index (0,1,2,3...). And use inplace=True instead of reassigning the df

Bugbeeb
  • 2,021
  • 1
  • 9
  • 26