I have and pandas dataframe with a multiindex that looks like this:
# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
# multi-indexed dataframe
df = pd.DataFrame(np.random.randn(8760 * 3, 3))
df['concept'] = "some_value"
df['datetime'] = pd.date_range(start='2016', periods=len(df), freq='60Min')
df.set_index(['concept', 'datetime'], inplace=True)
df.sort_index(inplace=True)
Console output:
df.head()
Out[23]:
0 1 2
datetime
2016 0.458802 0.413004 0.091056
2016 -0.051840 -1.780310 -0.304122
2016 -1.119973 0.954591 0.279049
2016 -0.691850 -0.489335 0.554272
2016 -1.278834 -1.292012 -0.637931
df.head()
...: df.tail()
Out[24]:
0 1 2
datetime
2018 -1.872155 0.434520 -0.526520
2018 0.345213 0.989475 -0.892028
2018 -0.162491 0.908121 -0.993499
2018 -1.094727 0.307312 0.515041
2018 -0.880608 -1.065203 -1.438645
Now I want to create annual sums along the level 'datetime'.
My first try was the following but this doesn't work:
# sum along years
years = df.index.get_level_values('datetime').year.tolist()
df.index.set_levels([years], level=['datetime'], inplace=True)
df = df.groupby(level=['datetime']).sum()
And it also seems quite heavy handed to me as this task is probably pretty easy to realize.
So here's my question: How can I get annual sums for the level 'datetime'? Is there a simple way to realize this by applying a function to the DateTime level values?