1

Considering I have the following tmax_period dataframe:

                     ID Element  Data_Value
Date                                       
2005-01-01  USW00014853    TMAX          56
2005-01-01  USC00200228    TMAX         150
2005-01-01  USC00207320    TMAX         150
2005-01-01  USC00207308    TMAX         150
2005-01-01  USC00200230    TMAX         122
DatetimeIndex(['2005-01-01', '2005-01-02', '2005-01-03', '2005-01-04',
               '2005-01-05', '2005-01-06', '2005-01-07', '2005-01-08',
               '2005-01-09', '2005-01-10',
               ...
               '2014-12-22', '2014-12-23', '2014-12-24', '2014-12-25',
               '2014-12-26', '2014-12-27', '2014-12-28', '2014-12-29',
               '2014-12-30', '2014-12-31'],
              dtype='datetime64[ns]', name='Date', length=3650, freq=None)

How can I group rows by month and day and apply max function to Data_Value column, so I can later build a plot with 365 datapoints? I tried to do the following:

tmax_period.groupby(by=[period.index.month, period.index.day])['Data_Value'].max()

but I got an "AssertionError: Grouper and axis must be same length" error.

EDIT: adding the result of tmax_period.head().to_dict() as asked in the comments:

{'Data_Value': {Timestamp('2005-01-01 00:00:00'): 122},
 'Element': {Timestamp('2005-01-01 00:00:00'): 'TMAX'},
 'ID': {Timestamp('2005-01-01 00:00:00'): 'USC00200230'}}

3 Answers3

0

The date range includes leap years, so aggregating by month and day would evidently yield 366 datapoints. Here is an example:

s = pd.Series(data=1, 
              index=pd.date_range(start='2005-01-01', 
                                  end='2014-12-31', 
                                  freq='d'), 
              name='x')
s.groupby([s.index.month, s.index.day]).count(). # has 366 elements, inc Feb 29

The groupby command 366 elements.

How was the DatetimeIndex created?

jsmart
  • 2,921
  • 1
  • 6
  • 13
0

I find pd.Grouper to be the easiest thing to use when grouping on index levels or grouping on datetime frequencies (or in this case, both).

df.groupby([
    pd.Grouper(level=0, freq='MS'),
    pd.Grouper(level=0, freq='D')
])['Data_Value'].max()

This creates two Grouper objects that aggregate by month (start) and day, respectively, on the 0th index level.

BallpointBen
  • 9,406
  • 1
  • 32
  • 62
0

I don't know why I haven't thought of this solution before. It finds the max value for each day of the year, without getting messed by the leap one, as I wanted:

tmax_period = period.groupby(lambda x: (x.month, x.day))['Data_Value'].max()