-1

I have sensor data sampled at 700HZ but they don't have timestamps and I want to generate timestamps using pandas.data_range() but I couldn't get freq exactly at 700HZ

Currently I create timestamps like this

time_stamps_at_700 = pd.date_range(datetime(2022, 1, 1, hour=00, minute=00), periods=len(labels), freq='1.430615ms')

Since 700HZ is not exactly divisable I couldn't equalize the frequency to 700hz

Is there a way I can generate timestamps exactly at 700HZ?

brkgcmn
  • 31
  • 5
  • 1
    Is your main concern that you want every 700th value to have a timestamp that is exactly on the second with no additional fraction of a second due to rounding of 1/700? Or are you concerned with something else? – constantstranger Jul 02 '22 at 23:28
  • Yes I want 700th sample to be exactly at 1 second. I have different datas at different sample rates and I need this to be at 700hz exactly to sync them – brkgcmn Jul 02 '22 at 23:34

1 Answers1

2

Rather than using a range object, which depends on the the accumulation of the step value, you could use np.linspace to give you a set number of intervals between two exact endpoints.

The following creates a linspace with 700 intervals from 0 to 1e9 nanoseconds (1 second), and you can see the first and last elements land on the second mark exactly:

In [3]: pd.to_timedelta(np.linspace(0, 1e9, 700).astype('timedelta64[ns]'))
Out[3]:
TimedeltaIndex([          '0 days 00:00:00', '0 days 00:00:00.001430615',
                '0 days 00:00:00.002861230', '0 days 00:00:00.004291845',
                '0 days 00:00:00.005722460', '0 days 00:00:00.007153075',
                '0 days 00:00:00.008583690', '0 days 00:00:00.010014306',
                '0 days 00:00:00.011444921', '0 days 00:00:00.012875536',
                ...
                '0 days 00:00:00.987124463', '0 days 00:00:00.988555078',
                '0 days 00:00:00.989985693', '0 days 00:00:00.991416309',
                '0 days 00:00:00.992846924', '0 days 00:00:00.994277539',
                '0 days 00:00:00.995708154', '0 days 00:00:00.997138769',
                '0 days 00:00:00.998569384',           '0 days 00:00:01'],
               dtype='timedelta64[ns]', length=700, freq=None)

For what it's worth, the core problem stems from the fact that python has no way to represent the value 1/700 exactly. Because of this, there will be rounding/floating-point errors no matter what when trying to approach this problem. The above deals with this by pinning the endpoints, and in the case of the endpoints 0 and 1, there is no error in the endpoing values. But in all of the intervening points, the data will likely still not match 700Hz data exactly. Generally, you should not rely on the exact alignment of floating point values (ever). Instead, you could use a handful of (related) strategies for handling this, including:

  • converting the 700 Hz data into a positional index relative to a reference point (if you know your series starts at a defined T0, bin the time series data into buckets, e.g. something along the lines of np.round((ts * 700).astype(float) / 1e9).astype(int). This would group everything into the nearest 700Hz bin, so e.g. [-0.5/700, 0.5/700) would be assigned the value 0, [0.5/700, 1.5/700) -> 1, etc.
  • comparing values between your datasets with a tolerance - see What is the best way to compare floats for almost-equality in Python?
Michael Delgado
  • 13,789
  • 3
  • 29
  • 54