In the case where you do not have MultiIndexed columns, you can do df.resample(freq).agg(some_dict)
where some_dict is in the form {column_name: function}
in order to apply a different function to each column (See demonstration below or see this question or the docs).
I'd like to do the same when I have MultiIndexed columns, but Pandas is doing the product between my columns and the dict.
Here's some dummy data to play with:
In [1]:
import pandas as pd
import numpy as np
cols = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'),
('B', 'one'), ('B', 'two')])
ind = pd.DatetimeIndex(start='2017-01-01', freq='15Min', periods=20)
df = pd.DataFrame(np.random.randn(20,4), index=ind, columns=cols)
print(df.head())
Out[1]:
A B
one two one two
2017-01-01 00:00:00 -0.627329 0.756533 2.149236 -1.204808
2017-01-01 00:15:00 1.493381 1.320806 -1.692557 1.225271
2017-01-01 00:30:00 -0.572762 1.365679 -1.993464 1.118474
2017-01-01 00:45:00 -1.785283 -1.625370 -0.437199 0.776267
2017-01-01 01:00:00 -0.220307 1.308388 2.981333 -0.569586
Now, let's create an aggregate dictionary, that maps columns to a specific function:
In [2]:
agg_dict = { col:(np.sum if col[1] == 'one' else np.mean) for col in df.columns }
agg_dict
Out[2]:
{('A', 'one'): <function numpy.core.fromnumeric.sum>,
('A', 'two'): <function numpy.core.fromnumeric.mean>,
('B', 'one'): <function numpy.core.fromnumeric.sum>,
('B', 'two'): <function numpy.core.fromnumeric.mean>}
Here it doesn't work, it actually does the product between my actual columns and the agg_dict. I expected a shape of (5,4)
, but I'm getting (5,16)
(4 entries in dict, 4 columns in df):
In [3]: df.resample('H').agg(agg_dict).shape
Out[3]: (5,16)
In [4]: print(df.resample('H').agg(agg_dict).columns.tolist())
Out[4]: [('A', 'one', 'A', 'one'), ('A', 'one', 'A', 'two'), ('A', 'one', 'B', 'one'), ('A', 'one', 'B', 'two'), ('A', 'two', 'A', 'one'), ('A', 'two', 'A', 'two'), ('A', 'two', 'B', 'one'), ('A', 'two', 'B', 'two'), ('B', 'one', 'A', 'one'), ('B', 'one', 'A', 'two'), ('B', 'one', 'B', 'one'), ('B', 'one', 'B', 'two'), ('B', 'two', 'A', 'one'), ('B', 'two', 'A', 'two'), ('B', 'two', 'B', 'one'), ('B', 'two', 'B', 'two')]
How can I obtain similar behavior to the non-MultiIndexed case, that is end up with a (5,4)
DataFrame here?
I can verify that it works using a non-MultiIndexed DataFrame.
In [5]:
df2 = df.copy()
# Flatten columns
df2.columns = ['_'.join(x) for x in df.columns]
# Create similar agg_dict
agg_dict2 = { col:(np.sum if 'one' in col else np.mean) for col in df2.columns }
print(df2.resample('H').agg(agg_dict2))
Out[5]:
A_one A_two B_one B_two
2017-01-01 00:00:00 -1.491994 0.454412 -1.973983 0.478801
2017-01-01 01:00:00 -0.931024 0.465611 4.837972 -0.118674
2017-01-01 02:00:00 2.015399 0.203814 1.539722 -0.296053
2017-01-01 03:00:00 -0.569376 -0.382343 -2.244470 -0.038828
2017-01-01 04:00:00 -0.747308 -0.212246 2.025314 0.713344