0

I have a dataframe that has a time property. This property is in seconds, but with nanosecond precision.

I was struggling to make this unique, but with help from a robot, managed to come up with this:

# Convert the time column to nanoseconds and add a sequence number for trades
df['time_ns'] = pd.to_datetime(df['time'], unit='s').values.astype(np.int64) + \
                np.arange(len(df)) % (10 ** 9)
df.set_index('time_ns', inplace=True)

# Convert the time_ns column to a DatetimeIndex with nanosecond precision
df.index = pd.to_datetime(df.index, unit='ns')

# Get a list of the non-unique timestamps
non_unique = df.index[df.index.duplicated(keep=False)].unique()

# Print the non-unique timestamps
print("Non-unique values:")
print(non_unique)

dataset = PandasDataset(df, target="price")

Now, there are no non-unique values. However, the frequency calculation when creating the dataset is falling over, due to this in /pandas/tseries/frequencies.py:

if not self.is_unique_asi8:
    return None

Digging into this with the penetrating insight into Python I have developed over the last two weeks , I have discovered that this property, too, is an indication of uniqueness.

So how do I configure the dataset so that the index is considered unique? That it is considered at nanosecond precision? The incoming dataframe, it seems, is now unique.

serlingpa
  • 12,024
  • 24
  • 80
  • 130
  • Could you include the frequency calculation? I definitely see that your code should be producing a unique index (though in some weird ways). – Carbon Jun 18 '23 at 12:44
  • I have asked my question in a different way here: https://stackoverflow.com/questions/76500849/how-to-prepare-my-data-to-avoid-being-unable-to-infer-frequency – serlingpa Jun 18 '23 at 13:42
  • @Carbon It is supposed to calculate the frequency on its own, if the index is unique. – serlingpa Jun 18 '23 at 13:42

0 Answers0