4

I have a dataframe of hourly OHLC as follows (please ignore the values of OHLC, I typed them in for better illustration),

hr_df =
                        Close      High       Low      Open
2017-09-04 05:00:00  0.715035  0.715035  0.715035  0.715035
2017-09-04 06:00:00  0.715035  0.715045  0.715015  0.715035
2017-09-04 07:00:00  0.715040  0.715050  0.714035  0.715035
:
:
2017-09-05 05:00:00  0.715045  0.715105  0.714985  0.715035
2017-09-05 06:00:00  0.715040  0.716045  0.714605  0.715035
2017-09-05 07:00:00  0.715040  0.717045  0.713225  0.715035
:
:
2017-09-06 05:00:00  0.715040  0.714045  0.713355  0.715035

I want to resample it into daily OHLC, example,

day_df =
               Close      High       Low      Open
2017-09-04  0.715035  0.715035  0.715035  0.715035
2017-09-05  0.715035  0.715045  0.715015  0.715035
2017-09-06  0.715040  0.715050  0.714035  0.715035
2017-09-07  0.715045  0.715105  0.714985  0.715035
2017-09-08  0.715040  0.716045  0.714605  0.715035
2017-09-09  0.715040  0.714045  0.713355  0.715035
2017-09-10  0.715040  0.717045  0.713225  0.715035

I tried using pandas resample method, day_df = hr_df.resample('D').pad() or day_df = hr_df.resample('D').ohlc() but it is not working. I know I am probably not using the proper method. I will really appreciate it if someone can guide me to an alternative solution or the proper method to use.

denis
  • 21,378
  • 10
  • 65
  • 88
Denzel
  • 358
  • 5
  • 19

3 Answers3

17

I think you need downsample by Resampler.agg by dictionary with keys for column names and values for functions:

day_df = (hr_df.resample('D')
               .agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}))
print (day_df)
                Open      High    Close       Low
2017-09-04  0.715035  0.715050  0.71504  0.714035
2017-09-05  0.715035  0.717045  0.71504  0.713225
2017-09-06  0.715035  0.714045  0.71504  0.713355
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I tried doing the same as above, however the result I get is completely different, mine looks like a multiindexed df with open/high/low/close appearing in my index and only one column. Would you have any idea of how to fix it and why it's not doing the same as you did? – Erlinska Jul 29 '18 at 19:01
  • @Erlinska - Not easy find problem, but I have one idea - is `DatetimeIndex` sorted? Maybe some data in another rows of DataFrame and you do no know about it. – jezrael Jul 29 '18 at 19:08
  • The DataFrame is exactly like the one in the question, I'm not sure of what a sorted DataTimeIndex is, the one I'm using is just read from a csv using `pd.read_csv(filename,index_col='DateTime',parse_dates=True)` – Erlinska Jul 29 '18 at 19:31
  • @Erlinska did you ever solve this im dealing with it now – Hudson Hughes Jan 08 '19 at 19:00
  • 1
    Quickest and most elegant way of resampling I have found so far! – Spino Sep 18 '21 at 00:58
2

Try to use pd.Grouper. If pair is a forex pair for example you can use it as index together with the newly grouped date:

hr_df.groupby([pd.Grouper(key='date',freq='D'), 'pair']).agg(
{'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'})
Asclepius
  • 57,944
  • 17
  • 167
  • 143
linello
  • 8,451
  • 18
  • 63
  • 109
1

That works for me.

quotes.set_index('end', inplace=True)
quotes.index = pd.to_datetime(quotes.index)
ohlc_dict = {
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last'
}
quotes.resample("1D", closed='left', label='left').apply(ohlc_dict).dropna()
bigEvilBanana
  • 388
  • 2
  • 8