I have a pandas dataframe that I'm trying to get a fairly complicated statistic for and I can't seem to figure it out. I have an initial dataframe for a years worth of data at hourly sampling, for a dozen nodes, that looks like this (datetimes are dtype=datetime):
df =
Price_Node_Name Price_Type Local_Datetime_HourEnding Price
1592204 FRANKLIN_FALL_HYD GENERATOR 2016-08-18 01:00:00 16.91
1592205 FRANKLIN_FALL_HYD GENERATOR 2016-08-18 02:00:00 16.32
1592206 FRANKLIN_FALL_HYD GENERATOR 2016-08-18 03:00:00 15.11
1592207 FRANKLIN_FALL_HYD GENERATOR 2016-08-18 04:00:00 13.53
1592208 FRANKLIN_FALL_HYD GENERATOR 2016-08-18 05:00:00 13.22
I then created a multi index dataframe with node name and datetime, followed by a grouping by node name:
df = df.set_index(['Price_Node_Name', 'Local_Datetime_HourEnding'])
df = df.groupby(level=0)
df =
Price_Type Price
Price_Node_Name Local_Datetime_HourEnding
FRANKLIN_FALL_HYD 2016-08-18 01:00:00 GENERATOR 16.91
2016-08-18 02:00:00 GENERATOR 17.05
2016-08-18 03:00:00 GENERATOR 18.09
2016-08-18 04:00:00 GENERATOR 12.75
What I am trying to do is, for each node, do a rolling 4-hour average of the prices and then average those rolling averages by month, so in the end I'll have an output that looks like this for each node:
1 2 3 4 5 6 7 ...
1 42.219417 30.506111 43.796000 33.946379 26.519583 24.253793 25.108167
2 43.239417 29.524074 40.186250 33.953017 30.221167 21.995431 23.777917
3 43.019167 30.988611 37.308879 34.901207 31.284333 20.896897 22.124083
4 40.920323 30.757679 35.237984 35.696500 31.725887 19.924583 20.755323
5 41.639839 30.606875 34.285000 36.307333 31.071935 18.862333 20.663790
6 38.041048 28.899911 32.746290 34.443333 24.454032 17.848750 19.732097
7 36.654516 24.987321 29.178065 31.912167 21.423145 16.796083 19.360081
8 37.500000 25.608750 29.166774 31.022083 20.673306 17.120167 19.837016
...
With months running across the top and hours running down the side, each value being that months hourly average price.
I'm having trouble looping through the dataframe and building a master data frame of the rolling-average monthly averages of each node. Help! I think I'm in over my head on this one.
My loop currently looks like this, which I'm not even sure what each step is accomplishing.
df_month_master = []
for node in df:
for i in range(1, 13):
df_month = df.loc[df['Local_Datetime_HourEnding'].dt.month == i]
df_month['Price'] = df_month['Price'].rolling(4).mean()
df_month = df.groupby([df_month['Local_Datetime_HourEnding'].dt.month,
df_month['Local_Datetime_HourEnding'].dt.hour]).mean()
df_month_master.append(df_month)
df_month_master_df = pd.DataFrame(df_month_master)
print(df_month_master_df)
It works when I don't multi index or groupby nodes, and manually go node by node, but naturally I want to be able to loop through and have a nice output. AND I would like to understand the manipulations that are happening.