3

In the case where you do not have MultiIndexed columns, you can do df.resample(freq).agg(some_dict) where some_dict is in the form {column_name: function} in order to apply a different function to each column (See demonstration below or see this question or the docs).

I'd like to do the same when I have MultiIndexed columns, but Pandas is doing the product between my columns and the dict.

Here's some dummy data to play with:

In [1]:
import pandas as pd
import numpy as np

cols = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'),
                                 ('B', 'one'), ('B', 'two')])
ind = pd.DatetimeIndex(start='2017-01-01', freq='15Min', periods=20)
df = pd.DataFrame(np.random.randn(20,4), index=ind, columns=cols)
print(df.head())

Out[1]:

                            A                   B          
                          one       two       one       two
2017-01-01 00:00:00 -0.627329  0.756533  2.149236 -1.204808
2017-01-01 00:15:00  1.493381  1.320806 -1.692557  1.225271
2017-01-01 00:30:00 -0.572762  1.365679 -1.993464  1.118474
2017-01-01 00:45:00 -1.785283 -1.625370 -0.437199  0.776267
2017-01-01 01:00:00 -0.220307  1.308388  2.981333 -0.569586

Now, let's create an aggregate dictionary, that maps columns to a specific function:

In [2]:
agg_dict = { col:(np.sum if col[1] == 'one' else np.mean) for col in df.columns }

agg_dict

Out[2]:
{('A', 'one'): <function numpy.core.fromnumeric.sum>,
 ('A', 'two'): <function numpy.core.fromnumeric.mean>,
 ('B', 'one'): <function numpy.core.fromnumeric.sum>,
 ('B', 'two'): <function numpy.core.fromnumeric.mean>}

Here it doesn't work, it actually does the product between my actual columns and the agg_dict. I expected a shape of (5,4), but I'm getting (5,16) (4 entries in dict, 4 columns in df):

In [3]: df.resample('H').agg(agg_dict).shape
Out[3]: (5,16)

In [4]: print(df.resample('H').agg(agg_dict).columns.tolist())
Out[4]: [('A', 'one', 'A', 'one'), ('A', 'one', 'A', 'two'), ('A', 'one', 'B', 'one'), ('A', 'one', 'B', 'two'), ('A', 'two', 'A', 'one'), ('A', 'two', 'A', 'two'), ('A', 'two', 'B', 'one'), ('A', 'two', 'B', 'two'), ('B', 'one', 'A', 'one'), ('B', 'one', 'A', 'two'), ('B', 'one', 'B', 'one'), ('B', 'one', 'B', 'two'), ('B', 'two', 'A', 'one'), ('B', 'two', 'A', 'two'), ('B', 'two', 'B', 'one'), ('B', 'two', 'B', 'two')]

How can I obtain similar behavior to the non-MultiIndexed case, that is end up with a (5,4)DataFrame here?


I can verify that it works using a non-MultiIndexed DataFrame.

In [5]:
df2 = df.copy()
# Flatten columns
df2.columns = ['_'.join(x) for x in df.columns]
# Create similar agg_dict
agg_dict2 = { col:(np.sum if 'one' in col else np.mean) for col in df2.columns }
print(df2.resample('H').agg(agg_dict2))

Out[5]:

                        A_one     A_two     B_one     B_two
2017-01-01 00:00:00 -1.491994  0.454412 -1.973983  0.478801
2017-01-01 01:00:00 -0.931024  0.465611  4.837972 -0.118674
2017-01-01 02:00:00  2.015399  0.203814  1.539722 -0.296053
2017-01-01 03:00:00 -0.569376 -0.382343 -2.244470 -0.038828
2017-01-01 04:00:00 -0.747308 -0.212246  2.025314  0.713344
Community
  • 1
  • 1
Julien Marrec
  • 11,605
  • 4
  • 46
  • 63

1 Answers1

3

I just came up with an idea that works using apply with a lambda

In [1]:
df.resample('H').apply(lambda x: agg_dict[x.name](x))

Out[1]:
                            A                   B          
                          one       two       one       two
2017-01-01 00:00:00 -2.211489  0.538068  1.379451 -0.619921
2017-01-01 01:00:00  1.524752 -0.195767  1.157592  0.137513
2017-01-01 02:00:00 -1.225071  0.020599 -1.372751 -0.245233
2017-01-01 03:00:00  2.922656  0.032864  3.118994  0.315109
2017-01-01 04:00:00 -1.438694  1.025585  1.915400 -0.536389

x.name returns for eg ('A', 'one'), so I use that to select the function in the dict, and pass x to it.

Julien Marrec
  • 11,605
  • 4
  • 46
  • 63