2

I have dowloaded daily data from yahoo finance

                    Open          High           Low         Close     Volume  \
Date                                                                            
2016-01-04  10485.809570  10485.910156  10248.580078  10283.440430  116249000   
2016-01-05  10373.269531  10384.259766  10173.519531  10310.099609   82348000   
2016-01-06  10288.679688  10288.679688  10094.179688  10214.019531   87751700   
2016-01-07  10144.169922  10145.469727   9810.469727   9979.849609  124188100   
2016-01-08  10010.469727  10122.459961   9849.339844   9849.339844   95672200   
...
2016-02-23   9503.120117   9535.120117   9405.219727   9416.769531   87240700   
2016-02-24   9396.480469   9415.330078   9125.190430   9167.799805   99216000   
2016-02-25   9277.019531   9391.309570   9199.089844   9331.480469          0   
2016-02-26   9454.519531   9576.879883   9436.330078   9513.299805   95662100   
2016-02-29   9424.929688   9498.570312   9332.419922   9495.400391   90978700   

I would like to find the maximum closing price each month and also the date of this closing price.

With a groupby dfM = df['Close'].groupby(df.index.month).max() it returns me the monthly maximums but I am losing the daily index position.

   grouped by month 
1      10310.099609
2       9757.879883

Is there a good way to to keep the index?

I would be looking for a result like this:

            grouped by month 
2016-01-05      10310.099609
2016-02-01       9757.879883
Igor Raush
  • 15,080
  • 1
  • 34
  • 55
Markus W
  • 1,451
  • 5
  • 19
  • 32

2 Answers2

9

You can get the max value per month using TimeGrouper together with groupby:

from pandas.io.data import DataReader

aapl = DataReader('AAPL', data_source='yahoo', start='2015-6-1')
>>> aapl.groupby(pd.TimeGrouper('M')).Close.max()
Date
2015-06-30    130.539993
2015-07-31    132.070007
2015-08-31    119.720001
2015-09-30    116.410004
2015-10-31    120.529999
2015-11-30    122.570000
2015-12-31    119.029999
2016-01-31    105.349998
2016-02-29     98.120003
2016-03-31    100.529999
Freq: M, Name: Close, dtype: float64

Using idxmax will get the corresponding dates of the max price.

>>> aapl.groupby(pd.TimeGrouper('M')).Close.idxmax()
Date
2015-06-30   2015-06-01
2015-07-31   2015-07-20
2015-08-31   2015-08-10
2015-09-30   2015-09-16
2015-10-31   2015-10-29
2015-11-30   2015-11-03
2015-12-31   2015-12-04
2016-01-31   2016-01-04
2016-02-29   2016-02-17
2016-03-31   2016-03-01
Name: Close, dtype: datetime64[ns]

To get the results side-by-side:

>>> aapl.groupby(pd.TimeGrouper('M')).Close.agg({'max date': 'idxmax', 'max price': np.max})
             max price   max date
Date                             
2015-06-30  130.539993 2015-06-01
2015-07-31  132.070007 2015-07-20
2015-08-31  119.720001 2015-08-10
2015-09-30  116.410004 2015-09-16
2015-10-31  120.529999 2015-10-29
2015-11-30  122.570000 2015-11-03
2015-12-31  119.029999 2015-12-04
2016-01-31  105.349998 2016-01-04
2016-02-29   98.120003 2016-02-17
2016-03-31  100.529999 2016-03-01
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • Great. Thanks a lot! Then I can just change the index to 'max date' and I am there :-) – Markus W Mar 02 '16 at 18:37
  • 5
    `pd.TimeGrouper('M')` no longer works. `AttributeError: module 'pandas' has no attribute 'TimeGrouper'`. Can you update the answer with `pd.Grouper`? – Trenton McKinney Oct 01 '20 at 22:27
0

My dataset is an electricity dataset where I am only interested in kW which a column in my df.

This works for me to find max values of the kW for each month in my dataset that is on 15 minute intervals.

max_kW_per_month = df.groupby(df.index.month)['kW'].agg(['idxmax', 'max'])
bbartling
  • 3,288
  • 9
  • 43
  • 88