Struggling to create a unique datetime index in Pandas dataset

Question

I have a dataframe that has a time property. This property is in seconds, but with nanosecond precision.

I was struggling to make this unique, but with help from a robot, managed to come up with this:

# Convert the time column to nanoseconds and add a sequence number for trades
df['time_ns'] = pd.to_datetime(df['time'], unit='s').values.astype(np.int64) + \
                np.arange(len(df)) % (10 ** 9)
df.set_index('time_ns', inplace=True)

# Convert the time_ns column to a DatetimeIndex with nanosecond precision
df.index = pd.to_datetime(df.index, unit='ns')

# Get a list of the non-unique timestamps
non_unique = df.index[df.index.duplicated(keep=False)].unique()

# Print the non-unique timestamps
print("Non-unique values:")
print(non_unique)

dataset = PandasDataset(df, target="price")

Now, there are no non-unique values. However, the frequency calculation when creating the dataset is falling over, due to this in /pandas/tseries/frequencies.py:

if not self.is_unique_asi8:
    return None

Digging into this with the penetrating insight into Python I have developed over the last two weeks , I have discovered that this property, too, is an indication of uniqueness.

So how do I configure the dataset so that the index is considered unique? That it is considered at nanosecond precision? The incoming dataframe, it seems, is now unique.

Could you include the frequency calculation? I definitely see that your code should be producing a unique index (though in some weird ways). — Carbon, Jun 18 '23 at 12:44
I have asked my question in a different way here: https://stackoverflow.com/questions/76500849/how-to-prepare-my-data-to-avoid-being-unable-to-infer-frequency — serlingpa, Jun 18 '23 at 13:42
@Carbon It is supposed to calculate the frequency on its own, if the index is unique. — serlingpa, Jun 18 '23 at 13:42

Struggling to create a unique datetime index in Pandas dataset

0 Answers0

Linked