2

I want to reassign the timestamps of a series of dates such that they get floored at a frequency of (e.g.) 3 days:

import pandas as pd

x = pd.date_range('01-01-2019', freq='1D', periods=7).floor('3D')
y = pd.date_range('01-01-2022', freq='1D', periods=7).floor('3D')

I am expecting the "floor" to align to the first date and produce:

In[3]: x
Out[3]: 
DatetimeIndex(['2019-01-01', '2019-01-01', '2019-01-01', '2019-01-04',
               '2019-01-04', '2019-01-04', '2019-01-07'],
              dtype='datetime64[ns]', freq=None)

In[4]: y
Out[4]:
DatetimeIndex(['2022-01-01', '2022-01-01', '2022-01-01', '2022-01-04',
               '2022-01-04', '2022-01-04', '2022-01-07'],
              dtype='datetime64[ns]', freq=None)

But instead it seems like there is a 3 day cycle the dates are floored to (presumably multiples of 3 days since Jan 1 1970?), so instead the result is:

In[3]: x
Out[3]: 
DatetimeIndex(['2018-12-30', '2019-01-02', '2019-01-02', '2019-01-02',
               '2019-01-05', '2019-01-05', '2019-01-05'],
              dtype='datetime64[ns]', freq=None)

In[4]: y
Out[4]:
DatetimeIndex(['2022-01-01', '2022-01-01', '2022-01-01', '2022-01-04',
               '2022-01-04', '2022-01-04', '2022-01-07'],
              dtype='datetime64[ns]', freq=None)

The results for x start on December 30 instead of January 1.

Is there a way to set a "base" for the floor operation in pandas? I say "base" because of the base argument in resample for doing similar adjustments. But I don't want to do any aggregation, just keep each element but reassign the timestamp.

Tom
  • 8,310
  • 2
  • 16
  • 36

1 Answers1

3
x = pd.date_range('01-01-2019', freq='1D', periods=7)
y = pd.date_range('01-01-2022', freq='1D', periods=7)

def floor(x, freq):
    offset = x[0].ceil(freq) - x[0]
    return (x + offset).floor(freq) - offset

print(floor(x, '3D'))
print(floor(y, '3D'))

Output

DatetimeIndex(['2019-01-01', '2019-01-01', '2019-01-01', '2019-01-04',
               '2019-01-04', '2019-01-04', '2019-01-07'],
              dtype='datetime64[ns]', freq=None)
DatetimeIndex(['2022-01-01', '2022-01-01', '2022-01-01', '2022-01-04',
               '2022-01-04', '2022-01-04', '2022-01-07'],
              dtype='datetime64[ns]', freq=None)

Adding addition logic:

def floor(x, freq):
    offset = x[0].ceil(freq) - x[0]
    adj_needed = (offset != pd.Timedelta(0))
    return (x + offset).floor(freq) - offset if adj_needed else x.floor(freq)
Pramote Kuacharoen
  • 1,496
  • 1
  • 5
  • 6
  • This is great but my only gripe is it's about 7x slower than `dt.floor()` on my machine. – Tom Jul 09 '20 at 21:41
  • I added an additional logic. Performance should be almost the same if the dates do not needed to be adjusted. – Pramote Kuacharoen Jul 09 '20 at 22:12
  • @PramoteKucharoen the addition doesn't improve the speed when adjustment is needed though, correct? – Tom Jul 09 '20 at 22:41
  • 1
    You cannot beat the library when Pandas just computes data = data // period * period. If Pandas provides an additional feature, it has to compute data = (data - first_day) // period * period + first_day which will take about twice as long. It requires two more operations. If you provide a unix timestamp, you can achieve 2x slower. You can't compete if you need more feature. However, if you insist to use Pandas Timestamp, it is a price to pay. – Pramote Kuacharoen Jul 10 '20 at 01:24