Question
What is an efficient way to carry out numerical operations to hierarchical index rows?
Problem
I have a large dataframe, over 1gb, that is indexed by year and then by country code. A small subset is shown below. Each country has multiple observations per year. I'd like to take the average of a each country's observations in a year and return an overall average. The desired end result would be a dataframe indexed by year then by each countries yearly average.
Conceptually, I'd like to do something like:
df.ix[:,['x3yv_E', 'x3yv_D', 'x1yv_E', 'x1yv_D']].groupby(df.year).groupby(level=1).apply(lambda x: np.mean(x))
heres the dataset:
x3yv_E x3yv_D x1yv_E x1yv_D
year
2003 12 0.000000 0.000000 0.000000 0.000000
34 0.009953 0.001400 0.007823 0.000950
12 0.010210 0.001136 0.008333 0.000722
34 0.011143 0.006319 0.007520 0.006732
72 0.018791 0.016717 0.018808 0.015206
2004 0 0.009115 0.000000 0.010243 0.000000
38 0.009059 0.000932 0.010042 0.000573
53 0.009618 0.001152 0.010724 0.000729
70 0.000000 0.000000 0.000000 0.000000
70 0.020655 0.018411 0.012965 0.011640
What I've tried
Benefits of panda's multiindex?
How to apply condition on level of pandas.multiindex?
Because of the large size of the dataframe, I'm looking to avoid loops and copying the dataframe multiple times like the solutions to the two questions above suggest.
Any ideas on an efficient solution? Thanks for taking a look!