0

I have 15 min OHLC data coming in continuously from an API and I wish to resample it to 30 min 1H etc

+----------------------+----------+----------+----------+----------+------------+
|      time            |  open    |  high    |   low    |  close   |  volumeto  |
+----------------------+----------+----------+----------+----------+------------+
|                      |          |          |          |          |            |
| 2018-06-26 03:30:00  | 6244.56  | 6250.54  | 6243.55  | 6247.34  | 1801280.76 |
| 2018-06-26 03:45:00  | 6247.34  | 6257.61  | 6246.43  | 6248.23  | 2551368.76 |
| 2018-06-26 04:00:00  | 6248.53  | 6248.53  | 6238.11  | 6239.83  | 2148705.55 |
| 2018-06-26 04:15:00  | 6239.83  | 6240.93  | 6232.14  | 6239.00  | 1906012.45 |
| 2018-06-26 04:30:00  | 6239.78  | 6269.14  | 6239.67  | 6260.96  | 3869709.59 |
| 2018-06-26 04:45:00  | 6261.16  | 6263.50  | 6249.46  | 6260.19  | 2236911.26 |
| 2018-06-26 05:00:00  | 6260.19  | 6284.27  | 6257.97  | 6263.73  | 5131896.24 |
| 2018-06-26 05:15:00  | 6263.73  | 6272.73  | 6263.59  | 6270.30  | 1589515.40 |
| 2018-06-26 05:30:00  | 6270.30  | 6286.36  | 6268.78  | 6270.69  | 2859640.60 |
| 2018-06-26 05:45:00  | 6271.75  | 6274.69  | 6259.29  | 6261.20  | 2448639.24 |
+----------------------+----------+----------+----------+----------+------------+

As per my use case, I round 3:45 and 4:00 from 15 mins to 4:00 at 30 mins and so on

agg = { 'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volumeto': 'sum'}
df.resample('30min', closed='right', label='right').agg(agg)

Original 15 mins data enter image description here

30 mins after rounding enter image description here

As you notice in the 30 mins timeframe, the first candle is generated from only 1 15 min candle instead of 2.

How do I skip incomplete candles only at the beginning of my computation instead of the end? I would like to similary skip for 1 hr etc Thanks

Community
  • 1
  • 1
PirateApp
  • 5,433
  • 4
  • 57
  • 90
  • 2
    did you read [how-to-make-good-reproducible-pandas](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)? – Brown Bear Jun 27 '18 at 12:57
  • 3
    To be more specific: posting images prevents me from easily copying and pasting data and hacking together an answer. Instead, I have to debate with myself if it is worth creating a sample dataframe, hand typing your data, or skipping altogether. Please don't post images. Post the data. Even if you can't format it, someone will step up and edit your post for you and format the code. – piRSquared Jun 27 '18 at 13:00
  • 1
    What code are you using to resample? Groupby with Grouper? Resample? – Scott Boston Jun 27 '18 at 13:01
  • apologies updated the question – PirateApp Jun 27 '18 at 13:01

1 Answers1

3

IIUC, you could do it this way. Add a count to your aggregation dict, then filter your dataframe based on the count, like this:

agg = { 'open': ['first','count'], 'high': 'max', 'low': 'min', 'close': 'last', 'volumeto': 'sum'}
df_out = df.resample('30min', closed='right', label='right').agg(agg)
df_out.columns = df_out.columns.map('_'.join)
df_out[df_out['open_count']>1]

Output:

                     open_first  open_count  high_max  low_min  close_last  volumeto_sum
time                                                                                    
2018-06-26 04:00:00     6247.34           2   6257.61  6238.11     6239.83    4700074.31
2018-06-26 04:30:00     6239.83           2   6269.14  6232.14     6260.96    5775722.04
2018-06-26 05:00:00     6261.16           2   6284.27  6249.46     6263.73    7368807.50
2018-06-26 05:30:00     6263.73           2   6286.36  6263.59     6270.69    4449156.00

Or to eliminate just the first incomplete group

df_out[~(df_out['open_count'] < 2).cumprod().astype(bool)]

Output:

                     open_first  open_count  high_max  low_min  close_last  volumeto_sum
time                                                                                    
2018-06-26 04:00:00     6247.34           2   6257.61  6238.11     6239.83    4700074.31
2018-06-26 04:30:00     6239.83           2   6269.14  6232.14     6260.96    5775722.04
2018-06-26 05:00:00     6261.16           2   6284.27  6249.46     6263.73    7368807.50
2018-06-26 05:30:00     6263.73           2   6286.36  6263.59     6270.69    4449156.00
2018-06-26 06:00:00     6271.75           1   6274.69  6259.29     6261.20    2448639.24
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • thank you, the only problem with this approach is that incomplete candles if any at the end of the data frame are also skipped, i am only trying to skip them at the start – PirateApp Jun 27 '18 at 13:21
  • 1
    You can change the filter approach then, just get rid of the first occurrence of an incomplete count. – Scott Boston Jun 27 '18 at 13:23