0

The following code from pandas/tseries/frequencies.py is causing my code to fall over:

if not self.is_monotonic or not self.index._is_unique:
    return None

delta = self.deltas[0]
ppd = periods_per_day(self._creso)
if delta and _is_multiple(delta, ppd):
    return self._infer_daily_rule()

# Business hourly, maybe. 17: one day / 65: one weekend
if self.hour_deltas in ([1, 17], [1, 65], [1, 17, 65]):
    return "BH"

# Possibly intraday frequency.  Here we use the
# original .asi8 values as the modified values
# will not work around DST transitions.  See #8772
if not self.is_unique_asi8:
    return None

The first test, self.index._is_unique, passes fine; the second, not self.is_unique_asi8, fails, and returns None.

I have looked at this issue and the corresponding PR but

My code, it its current form, looks like this:

db = Database()
df, last_trade_time = db.fetch_trades()

# Convert the time column to a datetime object with the unit of seconds
df['time'] = pd.to_datetime(df['time'], unit='s')

# Localize the timestamps to UTC
df['time'] = df['time'].dt.tz_localize('UTC')

# Ensure uniqueness by adding the index as nanoseconds
df['time'] = df['time'] + pd.to_timedelta(df.index, unit='ns')

# Set DataFrame index
df.set_index('time', inplace=True)

dataset = PandasDataset(df, target="price")

These times are in seconds, with sub-nanometer precision (from Kraken).

How can I prepare my data? Only a month or so of Python experience here...

I have asked this question in another form here

serlingpa
  • 12,024
  • 24
  • 80
  • 130

1 Answers1

0

It seems that the code is failing to detect the frequency of your data correctly. you can explicitly set the frequency of your time series data using the freq parameter when setting the index. Since your data is in seconds, you can specify the frequency as 'S' (seconds) or 'L' (milliseconds), based on the precision of your data. Try using:

import pandas as pd

db = Database()
df, last_trade_time = db.fetch_trades()

# Convert the time column to a datetime object with the unit of seconds
df['time'] = pd.to_datetime(df['time'], unit='s')

# Localize the timestamps to UTC
df['time'] = df['time'].dt.tz_localize('UTC')

# Set DataFrame index with explicit frequency
df.set_index('time', inplace=True, freq='S')

dataset = PandasDataset(df, target="price")
Phoenix
  • 1,343
  • 8
  • 10