0

When trying to resample OHLC dataframe from 1m to hourly i am getting this error:

Dataframe

df.info()

#   Column     Dtype         
---  ------     -----         
 0   Date_Time  datetime64[ns]
 1   Open       float64       
 2   High       float64       
 3   Low        float64       
 4   Close      float64

df.tail()

            Date_Time            Open    High    Low     Close
    1692259 2014-12-30 20:51:00  2321.0  1213.0  1223.0  2334.0
    1692260 2014-12-30 20:52:00  2342.0  2322.0  2332.0  2332.0
    1692261 2014-12-30 20:53:00  3421.0  2322.0  2334.0  2123.0
    1692262 2014-12-30 20:54:00  2312.0  2332.0  2324.0  2321.0
    1692263 2014-12-30 20:55:00  2312.0  1212.0  2343.0  2323.0
...

Attempt 1

df_ohlc = df.resample('60T', on='Date_Time').ohlc()

Error

DataError: No numeric types to aggregate

Attempt 2

Using venky__ recommendation to another post with a similar solution

df_ohlc = df.resample('60T', on='Date_Time').agg({
    'Open':'first',
    'High':'max',
    'Low':'min',
    'Close':'last'
})

NaN included but df is clean. How to avoid this?

                       Open    High     Low   Close
Date_Time                                          
2015-12-26 18:00:00     NaN     NaN     NaN     NaN
2015-12-26 19:00:00     NaN     NaN     NaN     NaN
2015-12-26 20:00:00     NaN     NaN     NaN     NaN
2015-12-26 21:00:00     NaN     NaN     NaN     NaN
2015-12-26 22:00:00     NaN     NaN     NaN     NaN
Vince
  • 507
  • 8
  • 21
  • Does this answer your question? [Pandas OHLC aggregation on OHLC data](https://stackoverflow.com/questions/36222928/pandas-ohlc-aggregation-on-ohlc-data) – Equinox Sep 27 '20 at 11:02
  • Thanks @venky__ Yes and No, NAs values appeared but dataframe is clean. Updated question. – Vince Sep 27 '20 at 11:18

1 Answers1

1

Next version of code worked for me, I did only 2 mins aggregation due to few rows provided by you.

I think the reason of non-working is due to that your data being in string format, instead of the need for date-time column to be in date-time format, and the rest in float numbers format, not strings. I did the right conversion to date-time from string in my code.

Or another reason for non-working is because you didn't do df.set_index('Date_Time'), I did this in my code too.

Try it online!

import io, pandas as pd, numpy as np

df = pd.read_csv(io.StringIO("""
Date_Time,Open,High,Low,Close
2014-12-30 20:51:00,2321.0,1213.0,1223.0,2334.0
2014-12-30 20:52:00,2342.0,2322.0,2332.0,2332.0
2014-12-30 20:53:00,3421.0,2322.0,2334.0,2123.0
2014-12-30 20:54:00,2312.0,2332.0,2324.0,2321.0
2014-12-30 20:55:00,2312.0,1212.0,2343.0,2323.0
"""))

df.Date_Time = pd.to_datetime(df.Date_Time)
df.set_index('Date_Time')

print(df)

df = df.resample('2min', on = 'Date_Time').agg({
    'Open':'first',
    'High':'max',
    'Low':'min',
    'Close':'last',
})

print(df)

Output is:

            Date_Time    Open    High     Low   Close
0 2014-12-30 20:51:00  2321.0  1213.0  1223.0  2334.0
1 2014-12-30 20:52:00  2342.0  2322.0  2332.0  2332.0
2 2014-12-30 20:53:00  3421.0  2322.0  2334.0  2123.0
3 2014-12-30 20:54:00  2312.0  2332.0  2324.0  2321.0
4 2014-12-30 20:55:00  2312.0  1212.0  2343.0  2323.0
                       Open    High     Low   Close
Date_Time
2014-12-30 20:50:00  2321.0  1213.0  1223.0  2334.0
2014-12-30 20:52:00  2342.0  2322.0  2332.0  2123.0
2014-12-30 20:54:00  2312.0  2332.0  2324.0  2323.0
Arty
  • 14,883
  • 6
  • 36
  • 69
  • Thanks a lot @Arty. Is working if set Date_time as index, i thought using on='Date_Time' parameter doesn't require Date_Time as index. But anyway, is wotking now – Vince Sep 27 '20 at 17:08