8

I am trying to find the hour of max demand every day in my demand time series.

I have created a dataframe that looks like..

                       power
2011-01-01 00:00:00  1015.70
2011-01-01 01:00:00  1015.70
2011-01-01 02:00:00  1010.30
2011-01-01 03:00:00  1010.90
2011-01-01 04:00:00  1021.10
2011-01-01 05:00:00  1046.00
2011-01-01 06:00:00  1054.60
...

and a grouped series to find the max value from each day using .max()

grouped = df.groupby(pd.TimeGrouper('D'))
grouped['power'].max()

OUTPUT

2011-01-01    1367.30
2011-01-02    1381.90
2011-01-03    1289.00
2011-01-04    1323.50
2011-01-05    1372.70
2011-01-06    1314.40
2011-01-07    1310.60
...

However I need the hour of the max value also. So something like:

2011-01-01  18  1367.30
2011-01-02  5   1381.90
2011-01-03  22  1289.00
2011-01-04  10  1323.50
...

I have tried using idxmax() but I keep getting a ValueError

piRSquared
  • 285,575
  • 57
  • 475
  • 624
lewis2ba
  • 105
  • 1
  • 5

2 Answers2

9

UPDATE from 2018-09-19:

FutureWarning: pd.TimeGrouper is deprecated and will be removed; Please use pd.Grouper(freq=...)

solution:

In [295]: df.loc[df.groupby(pd.Grouper(freq='D')).idxmax().iloc[:, 0]]
Out[295]:
                                         power
2011-01-01 06:00:00                     1054.6
2011-01-02 06:00:00                     2054.6

Old answer:

try this:

In [376]: df.loc[df.groupby(pd.TimeGrouper('D')).idxmax().iloc[:, 0]]
Out[376]:
                                           power
2011-01-01 06:00:00                       1054.6
2011-01-02 06:00:00                       2054.6

data:

In [377]: df
Out[377]:
                                           power
2011-01-01 00:00:00                       1015.7
2011-01-01 01:00:00                       1015.7
2011-01-01 02:00:00                       1010.3
2011-01-01 03:00:00                       1010.9
2011-01-01 04:00:00                       1021.1
2011-01-01 05:00:00                       1046.0
2011-01-01 06:00:00                       1054.6
2011-01-02 00:00:00                       2015.7
2011-01-02 01:00:00                       2015.7
2011-01-02 02:00:00                       2010.3
2011-01-02 03:00:00                       2010.9
2011-01-02 04:00:00                       2021.1
2011-01-02 05:00:00                       2046.0
2011-01-02 06:00:00                       2054.6
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • This works great, thanks! It is working well for a years worth of data, when I try to run two years i get: `raise IndexError("single positional indexer is out-of-bounds") IndexError: single positional indexer is out-of-bounds`. I am guessing something Is wrong with my data. – lewis2ba May 25 '16 at 17:12
  • Do you have missing days in your data? `pd.TimeGrouper('D')` will generate groups for each date between the min and max date, so you'll get empty groups for missing days, which will result in the error you're seeing. – root May 25 '16 at 17:25
  • There doesn't seem to be any missing days in the data. I can run each year separately(2011 or 2012) and get the results. When I run them together(2011 and 2012) I get the value error. Going to make sure the 2 year csv file is formatting correctly – lewis2ba May 25 '16 at 17:41
  • @MaxU, Hi I am trying to perform same task above. But, `TimeGrouper` is depreciated now. I am trying this code : `df['time'].loc[df.groupby(pd.Grouper(key='time',freq='D')).max()]`. But, this gives error? – Msquare Sep 19 '18 at 11:41
  • @MaxU, Thanks a ton for responding quickly and updating your answer. I tried this code `df.loc[df.groupby(pd.Grouper(key='time',freq='D')).idxmax().iloc[:,0]]`. The output is an error with message: `IndexError: single positional indexer is out-of-bounds`. Any guess? – Msquare Sep 19 '18 at 11:50
  • @Msquare, i afraid one would need a small __reproducible__ data set in order to be able to help ... Can you reproduce that error using a data set from my answer? – MaxU - stand with Ukraine Sep 19 '18 at 11:52
  • @MaxU, I just tried your code `df.loc[df.groupby(pd.Grouper(freq='D')).idxmax().iloc[:,0]]`. The output is: `TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'`. It is actually a 150 MB data file. I can share a part of it. How to do it? – Msquare Sep 19 '18 at 11:54
  • @Msquare, try to create a new question including [a minimal reproducible sample data set](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), which would allow to reproduce that error message and your desired data set. – MaxU - stand with Ukraine Sep 19 '18 at 11:56
  • @MaxU, I have formed a new question here https://stackoverflow.com/questions/52405615/finding-daily-maximum-and-its-time-stamp-in-python-pandas . could you please check it? – Msquare Sep 19 '18 at 12:13
  • Also make sure to check the index of your dataframe returns a DateTimeIndex. you can check and set this using `df.index` – sh87 Feb 01 '19 at 04:13
0

You can also group by your index date with df.groupby(df.index.date) and then use idxmax() to find the index of the max value in the power column:

df.groupby(df.index.date)["power"].idxmax()

If you want the power values too, use .loc:

df.loc[df.groupby(df.index.date)["power"].idxmax()]