0

I am writing some code to interpolate some data with space (x, y) and time. The data needs to be on a regular grid. I cant seem to make a generalized function to find a date range with regular spacing. The range that fails for me is:

date_min = numpy.datetime64('2022-10-24T00:00:00.000000000')
date_max = numpy.datetime64('2022-11-03T00:00:00.000000000')

And it needs to roughly match the current values of times I have, which for this case is 44.

periods = 44

I tried testing if the time difference is divisible by 2 and then adding 1 to the number of periods, which worked for a lot of cases, but it doesn't seem to really work for this time range:

def unique_diff(x):
    return numpy.unique(numpy.diff(x))


unique_diff(pd.date_range(date_min, date_max, periods=periods))
Out[31]: array([20093023255813, 20093023255814], dtype='timedelta64[ns]')

unique_diff(pd.date_range(date_min, date_max, periods=periods+1))
Out[32]: array([19636363636363, 19636363636364], dtype='timedelta64[ns]')

unique_diff(pd.date_range(date_min, date_max, periods=periods-1))
Out[33]: array([20571428571428, 20571428571429], dtype='timedelta64[ns]')

However, it does work for +2:

unique_diff(pd.date_range(date_min, date_max, periods=periods+2))
Out[34]: array([19200000000000], dtype='timedelta64[ns]')

I could just keep trying different period deltas until I get a solution, but I would rather know why this is happening and how I can generalize this problem for any min/max times with a target number of periods

Tom McLean
  • 5,583
  • 1
  • 11
  • 36

1 Answers1

1

Your date range doesn't divide evenly by the periods in nanosecond resolution:

# as the contains start and end, there's a step fewer than there are periods 
steps = periods - 1
int(date_max - date_min) / steps
# 20093023255813.953

A solution could be to round up (or down) your max date, to make it divide evenly in nanosecond resolution:

date_max_r = (date_min + 
              int(numpy.ceil(int(date_max - date_min) / (steps)) * (steps)))

unique_diff(pd.date_range(date_min, date_max_r, periods=periods))
# array([20093023255814], dtype='timedelta64[ns]')
w-m
  • 10,772
  • 1
  • 42
  • 49
  • I cant really change the bounds of the data (those values came from reading a file, and I want to keep the bounds) but I can change the periods. Is there a way to keep the min and max dates, but just change the periods? – Tom McLean Oct 24 '22 at 11:27
  • That depends on what kind of resolution your given dates have. In the example above they just have day resolution, that's quite easy. Even up to seconds resolution, you just need to split the second evenly into nanoseconds, so periods=41 or periods=51 will always work. If your dates have nanosecond resolution themselves, you'll have to find a different divisor every time, and are not guaranteed to find one. The difference between date_max_r and date_max are 2 nanoseconds, by the way. – w-m Oct 24 '22 at 13:08
  • Thanks for the explanation very helpful. I m going to accept this as the answer :) I think potentially I could also do some sort of [largest divisor of N](https://stackoverflow.com/questions/41834643/largest-divisor-of-n-excluding-itself) And exit when I am close to the target number of periods, but yours probably works nicer – Tom McLean Oct 24 '22 at 13:14