pandas: how to run a pivot with a multi-index?

Question

I would like to run a pivot on a pandas DataFrame, with the index being two columns, not one. For example, one field for the year, one for the month, an 'item' field which shows 'item 1' and 'item 2' and a 'value' field with numerical values. I want the index to be year + month.

The only way I managed to get this to work was to combine the two fields into one, then separate them again. is there a better way?

Minimal code copied below. Thanks a lot!

PS Yes, I am aware there are other questions with the keywords 'pivot' and 'multi-index', but I did not understand if/how they can help me with this question.

import pandas as pd
import numpy as np

df= pd.DataFrame()
month = np.arange(1, 13)
values1 = np.random.randint(0, 100, 12)
values2 = np.random.randint(200, 300, 12)


df['month'] = np.hstack((month, month))
df['year'] = 2004
df['value'] = np.hstack((values1, values2))
df['item'] = np.hstack((np.repeat('item 1', 12), np.repeat('item 2', 12)))

# This doesn't work: 
# ValueError: Wrong number of items passed 24, placement implies 2
# mypiv = df.pivot(['year', 'month'], 'item', 'value')

# This doesn't work, either:
# df.set_index(['year', 'month'], inplace=True)
# ValueError: cannot label index with a null key
# mypiv = df.pivot(columns='item', values='value')

# This below works but is not ideal: 
# I have to first concatenate then separate the fields I need
df['new field'] = df['year'] * 100 + df['month']

mypiv = df.pivot('new field', 'item', 'value').reset_index()
mypiv['year'] = mypiv['new field'].apply( lambda x: int(x) / 100)  
mypiv['month'] = mypiv['new field'] % 100

I've provided several detailed examples and alternative approaches in this [**Q&A**](https://stackoverflow.com/q/47152691/2336654) — piRSquared, Nov 11 '17 at 22:19

Alexander · Accepted Answer · 2018-09-07T03:44:16.480

You can group and then unstack.

>>> df.groupby(['year', 'month', 'item'])['value'].sum().unstack('item')
item        item 1  item 2
year month                
2004 1          33     250
     2          44     224
     3          41     268
     4          29     232
     5          57     252
     6          61     255
     7          28     254
     8          15     229
     9          29     258
     10         49     207
     11         36     254
     12         23     209

Or use pivot_table:

>>> df.pivot_table(
        values='value', 
        index=['year', 'month'], 
        columns='item', 
        aggfunc=np.sum)
item        item 1  item 2
year month                
2004 1          33     250
     2          44     224
     3          41     268
     4          29     232
     5          57     252
     6          61     255
     7          28     254
     8          15     229
     9          29     258
     10         49     207
     11         36     254
     12         23     209

@Alexander, `pivot_table()` requires _aggfunc_ parameter and if no such parameter is provided then `mean()` function is used by default. If `sum()` capability is required then `pivot_table()` function should have `aggfunc=sum` added to the call. **Source:** [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot_table.html#pandas.DataFrame.pivot_table) _aggfunc : function, list of functions, dict, default numpy.mean_ — rp1, Sep 07 '18 at 02:23
@rp1 Good point. It was identical because the original sample dataframe only had one value per item, year and month. Amending response to reflect correction. — Alexander, Sep 07 '18 at 03:41

score 34 · Answer 2 · answered Feb 15 '16 at 18:00

34

I believe if you include item in your MultiIndex, then you can just unstack:

df.set_index(['year', 'month', 'item']).unstack(level=-1)

This yields:

                value      
item       item 1 item 2
year month              
2004 1         21    277
     2         43    244
     3         12    262
     4         80    201
     5         22    287
     6         52    284
     7         90    249
     8         14    229
     9         52    205
     10        76    207
     11        88    259
     12        90    200

It's a bit faster than using pivot_table, and about the same speed or slightly slower than using groupby.

answered Feb 15 '16 at 18:00

Ajean

5,528
14
46
69

1

You can also reference the index level directly, e.g. df.set_index(['year', 'month', 'item']).unstack('fcode'). Alexander's first method in his answer would also fail with non-numerical data (in a more generalized problem than the one here). – Carl Aug 23 '17 at 14:18
this is an amazing answer. – Jason Wolosonovich Dec 21 '17 at 21:08
I like this answer, but I'm having some trouble with its implementation. I have multiple columns of data, that I want to flatten to a single row. This answer gets me most of where I need to go, but I'm getting a ton of NaN's in my data. Any ideas? – Adrian Keister Aug 13 '19 at 22:25
Never mind, I figured it out. I went with code like this: `df = pd.DataFrame(df.set_index(['PRS 3']).unstack(-1))` and then `df = df.transpose()`. – Adrian Keister Aug 13 '19 at 22:35

score 4 · Answer 3 · edited May 09 '21 at 17:07

4

The following worked for me:

mypiv = df.pivot(index=['year','month'],columns='item')[['values1','values2']]

edited May 09 '21 at 17:07

buddemat

4,552
14
29
49

answered May 09 '21 at 00:18

Sammeer S Raawat

41
4

score 0 · Answer 4 · answered Jan 26 '20 at 12:45

thanks to gmoutso comment you can use this:

def multiindex_pivot(df, index=None, columns=None, values=None):
    if index is None:
        names = list(df.index.names)
        df = df.reset_index()
    else:
        names = index
    list_index = df[names].values
    tuples_index = [tuple(i) for i in list_index] # hashable
    df = df.assign(tuples_index=tuples_index)
    df = df.pivot(index="tuples_index", columns=columns, values=values)
    tuples_index = df.index  # reduced
    index = pd.MultiIndex.from_tuples(tuples_index, names=names)
    df.index = index
    return df

usage:

df.pipe(multiindex_pivot, index=['idx_column1', 'idx_column2'], columns='foo', values='bar')

You might want to have a simple flat column structure and have columns to be of their intended type, simply add this:

(df
   .infer_objects()  # coerce to the intended column type
   .rename_axis(None, axis=1))  # flatten column headers

pandas: how to run a pivot with a multi-index?

4 Answers4

Linked