1

I have a pandas dataframe that I'm trying to get a fairly complicated statistic for and I can't seem to figure it out. I have an initial dataframe for a years worth of data at hourly sampling, for a dozen nodes, that looks like this (datetimes are dtype=datetime):

df =
          Price_Node_Name   Price_Type      Local_Datetime_HourEnding  Price 
1592204    FRANKLIN_FALL_HYD   GENERATOR     2016-08-18 01:00:00       16.91 
1592205    FRANKLIN_FALL_HYD   GENERATOR     2016-08-18 02:00:00       16.32 
1592206    FRANKLIN_FALL_HYD   GENERATOR     2016-08-18 03:00:00       15.11 
1592207    FRANKLIN_FALL_HYD   GENERATOR     2016-08-18 04:00:00       13.53  
1592208    FRANKLIN_FALL_HYD   GENERATOR     2016-08-18 05:00:00       13.22

I then created a multi index dataframe with node name and datetime, followed by a grouping by node name:

df = df.set_index(['Price_Node_Name', 'Local_Datetime_HourEnding'])
df = df.groupby(level=0)
df =
                                                   Price_Type    Price
Price_Node_Name     Local_Datetime_HourEnding                  
FRANKLIN_FALL_HYD   2016-08-18 01:00:00             GENERATOR     16.91
                    2016-08-18 02:00:00             GENERATOR     17.05
                    2016-08-18 03:00:00             GENERATOR     18.09
                    2016-08-18 04:00:00             GENERATOR     12.75

What I am trying to do is, for each node, do a rolling 4-hour average of the prices and then average those rolling averages by month, so in the end I'll have an output that looks like this for each node:

1          2            3           4           5           6           7   ...
1 42.219417 30.506111   43.796000   33.946379   26.519583   24.253793   25.108167
2 43.239417 29.524074   40.186250   33.953017   30.221167   21.995431   23.777917
3 43.019167 30.988611   37.308879   34.901207   31.284333   20.896897   22.124083
4 40.920323 30.757679   35.237984   35.696500   31.725887   19.924583   20.755323
5 41.639839 30.606875   34.285000   36.307333   31.071935   18.862333   20.663790
6 38.041048 28.899911   32.746290   34.443333   24.454032   17.848750   19.732097
7 36.654516 24.987321   29.178065   31.912167   21.423145   16.796083   19.360081
8 37.500000 25.608750   29.166774   31.022083   20.673306   17.120167   19.837016
...

With months running across the top and hours running down the side, each value being that months hourly average price.

I'm having trouble looping through the dataframe and building a master data frame of the rolling-average monthly averages of each node. Help! I think I'm in over my head on this one.

My loop currently looks like this, which I'm not even sure what each step is accomplishing.

df_month_master = []
for node in df:
    for i in range(1, 13):
        df_month = df.loc[df['Local_Datetime_HourEnding'].dt.month == i]
        df_month['Price'] = df_month['Price'].rolling(4).mean()
        df_month = df.groupby([df_month['Local_Datetime_HourEnding'].dt.month,
                                                     df_month['Local_Datetime_HourEnding'].dt.hour]).mean()
        df_month_master.append(df_month)
df_month_master_df = pd.DataFrame(df_month_master)
print(df_month_master_df)

It works when I don't multi index or groupby nodes, and manually go node by node, but naturally I want to be able to loop through and have a nice output. AND I would like to understand the manipulations that are happening.

  • Is [pandas.DataFrame.rolling](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html) what you're looking for? – RagingRoosevelt Oct 04 '17 at 18:41
  • Possible duplicate of [Moving Average- Pandas](https://stackoverflow.com/questions/40060842/moving-average-pandas) – RagingRoosevelt Oct 04 '17 at 18:45
  • It's part of it. I also want to then average those rolling averages for each hour in the month. I want to average all the 1am's for each day in January, etc. Edit: And understand each part of the dataframe manipulation. – Ryan Kladar Oct 04 '17 at 21:46
  • I think you could do that with the above functions just by how you slice your dataframe and how you specify your window. – RagingRoosevelt Oct 05 '17 at 18:04
  • Right, I'm just trying to figure out the best way to go about that. I don't have a lot of intuition on what happens in indexing and creating groupby dataframes. – Ryan Kladar Oct 05 '17 at 19:20
  • If `df` is your DataFrame, could you run `df.head(10).to_dict(orient='list')` and paste that as part of your question? It would make it a lot easier to check a potential answer. – RagingRoosevelt Oct 05 '17 at 21:56

0 Answers0