Pandas pivoted dataframe with the agg() function

Question

Say I have a pivoted dataframe of the form

           Value             Qty            Code           
Color       Blue Green  Red Blue Green  Red Blue Green  Red
Date                                                       
2017-07-01   0.0   1.1  0.0  0.0  12.0  0.0    0   abc    0
2017-07-03   2.3   1.3  0.0  3.0   1.0  0.0  cde   abc    0
2017-07-06   0.0   0.0  1.4  0.0   0.0  1.0    0     0  cde

I am interested in resampling the Date into weekly frequency. I would like to perform the following transformation on each of the sub-columns of the major column, Value: max, Qty: sum, Code = last. In a normal non-MultiIndex dataframe, df, one would do the following via the agg() function.

df.resample('W').agg({"Value":"max", "Qty":"sum", "Code":"last"})

But when I try it with the pivoted dataframe, it doesn't like the keys. How would I do it in the case of multi-index dataframe without explicitly specifying all the sub-columns?

The expected output is

           Value             Qty             Code           
Color       Blue Green  Red Blue Green  Red  Blue Green  Red
Date                                       
2017-07-02   0.0   1.1  0.0  0.0  12.0  0.0     0   abc    0
2017-07-09   2.3   1.3  1.4  3.0   1.0  1.0     0     0  cde

To generate the above starting dataframe, use the following code

from collections import OrderedDict
import pandas as pd

table = OrderedDict((
    ("Date", ["2017-07-01", "2017-07-03", "2017-07-03", "2017-07-6"]),
    ('Color',['Green', 'Blue', 'Green', 'Red']),
    ('Value',  [1.1, 2.3, 1.3, 1.4]),
    ('Qty', [12, 3, 1, 1]),
    ('Code',   ['abc', 'cde', 'abc', 'cde'])
))
d = pd.DataFrame(table)
p = d.pivot(index='Date', columns='Color')
p.index = pd.to_datetime(p.index)
p.fillna(0, inplace=True)

EDIT: Added desired result.

EDIT 2: I have also tried to create a dictionary to feed into the agg() function but it's coming out with 4 levels of column headers.

dc = dict(zip(p.columns, map({'Value': 'max', 'Qty': 'sum', 'Code': 'last'}.get, [x[0] for x in p.columns])))

newp = p.resample('W').agg(dc)

What's your expected output? – Andrew L Jul 11 '17 at 09:46 — Andrew L, Jul 11 '17 at 09:46

Andrew L · Answer 1 · 2017-07-11T18:56:23.603

I believe you'll need to stack() to avoid the MultiIndex. There doesn't seem to be a way to specify level=0 in the agg method of a groupby or resample object so this was the only way I could figure it out (let me know if this isn't accurate):

p.stack().reset_index(level=1).groupby(pd.Grouper(freq='w')).agg({'Value': 'max', 'Qty': 'sum', 'Code': 'last'})

            Qty  Value Code
Date                        
2017-07-02  12.0    1.1    0
2017-07-09   5.0    2.3  code

Stack will bring the colors to the index along axis 0, reset the index to convert MultiIndex to DateTimeIndex, the remainder is pretty straightforward.

EDIT

Does this work?

dic = {'Value': 'max', 'Qty': 'sum', 'Code': 'last'}
df = pd.DataFrame()
for i in p.columns.get_level_values(0).unique():
    temp = p.xs(i, axis=1, level=0, drop_level=False).resample('W').agg(dic[i])
    df = pd.concat([df, temp], axis=1)
df.columns=p.columns

df
           Value             Qty            Code           
Color       Blue Green  Red Blue Green  Red Blue Green  Red
Date                                                       
2017-07-02   0.0   1.1  0.0  0.0  12.0  0.0    0   abc    0
2017-07-09   2.3   1.3  1.4  3.0   1.0  1.0    0     0  cde

I don't know how "fail proof" this method is so use caution. Setting df.columns=p.columns seems sketchy but keeping the multiindex has been the major challenge. If I set levels=p.columns.levels in pd.concat() (which seems safer) it flattens the index to tuples which could also be unpacked into a multiindex. I've tested this a few different ways and it seems to be fine.

Thanks for your attempt. I am looking to preserve the same column structure but resample the index to a weekly frequency. Essentially, p.resample('W').max() gives the right answer for the Value columns. p.resample('W').sum() gives the right answer for the Qty columns. And p.resample('W').last() for the Code columns. I guess I could do all these separately and merged the correct columns back together but I was hoping for a more general method. — Spinor8, Jul 11 '17 at 10:07
Another option would be to flatten the column MultiIndex and perform your calculations that way. I've tried it a couple different ways but it doesn't seem clean. — Andrew L, Jul 11 '17 at 12:24
Do you mind sharing how you flattened the MultiIndex columns? I am new to MultiIndex dataframes and I am always on the lookout for interesting techniques. :-) — Spinor8, Jul 11 '17 at 12:49
Sure, try this- `df.columns = [' '.join(col).strip() for col in df.columns.values]`. Credit goes to Andy Hayden- https://stackoverflow.com/questions/14507794/python-pandas-how-to-flatten-a-hierarchical-index-in-columns — Andrew L, Jul 11 '17 at 18:36

Parfait · Accepted Answer · 2017-07-11T20:46:45.573

Consider first combining the hierarchical columns and running weekly aggregates by the different column types: Value, Qty, and Code.

# COMBINE THE LIST OF MULTI-LEVEL COLUMN (LIST OF TUPLES)
p.columns = [i[0]+i[1] for i in p.columns]
p.columns = p.columns.get_level_values(0)

# HORIZONTAL MERGE
out = pd.concat([p.resample('W').max()[[c for c in p.columns if 'Value' in c]],
                 p.resample('W').sum()[[c for c in p.columns if 'Qty' in c]],
                 p.resample('W').last()[[c for c in p.columns if 'Code' in c]]], axis=1)
print(out)
#             ValueBlue  ValueGreen  ValueRed  QtyBlue  QtyGreen  QtyRed  CodeBlue CodeGreen CodeRed
# Date                                                                                              
# 2017-07-02        0.0         1.1       0.0      0.0      12.0     0.0         0       abc       0
# 2017-07-09        2.3         1.3       1.4      3.0       1.0     1.0         0         0     cde

To retain original hierarchical columns, save the column object before flattening the column levels and then re-assign back to columns after the resampling process:

pvtcolumns = p.columns

# ...same code as above

out.columns = pvtcolumns
print(df)

#             Value           Qty             Code           
# Color       Blue Green  Red Blue Green  Red Blue Green  Red
# Date                                                       
# 2017-07-02   0.0   1.1  0.0  0.0  12.0  0.0    0   abc    0
# 2017-07-09   2.3   1.3  1.4  3.0   1.0  1.0    0     0  cde

Thanks for your answer. The example starting dataframe (p) I gave is representative of what I have after multiple operations such as cumsum, algebraic manipulations across the major x-axes. I do not see how I can unpivot it easily. Would it be possible to start from p above and connect up to your above solution? — Spinor8, Jul 11 '17 at 16:05

Pandas pivoted dataframe with the agg() function

2 Answers2