1

Consider the following index i which spans 1.5 days:

import pandas as pd
i = pd.date_range('2022-01-01', freq='H', periods=36)

The timestamps can be floored to the nearest day start, which gives 24 times '2022-01-01' and 12 times '2022-01-02':

i.floor("D")
# DatetimeIndex(['2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02',
#                '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02',
#                '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02'],
#               dtype='datetime64[ns]', freq=None)

However, in my current use-case, I want to floor them to '06:00', to get this output:

# DatetimeIndex(['2021-12-31 06:00', '2021-12-31 06:00', '2021-12-31 06:00', '2021-12-31 06:00',
#                '2021-12-31 06:00', '2021-12-31 06:00', '2022-01-01 06:00', '2022-01-01 06:00',
#                '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',               
#                '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',               
#                '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',               
#                '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',               
#                '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',               
#                '2021-02-01 06:00', '2022-01-01 06:00', '2022-01-02 06:00', '2022-01-02 06:00', 
#                '2022-01-02 06:00', '2022-01-02 06:00', '2022-01-02 06:00', '2022-01-02 06:00'],
#               dtype='datetime64[ns]', freq=None)

Preferably by using the following grouper object:

gr = pd.Grouper(freq='D', offset=pd.Timedelta(hours=6))

Is there a way to do that?

Remarks:

  • The reason for wanting to use gr is that it's used elsewhere in aggregating (pd.Series.groupby). It'd be great to re-use it here.

  • The use-case includes flooring to '06:00' of the first day of the month, quarter, or year.

ElRudi
  • 2,122
  • 2
  • 18
  • 33
  • There is not much advantage in "reusing" `gr`, it's not like reusing a `GroupbBy` object: `g = df.groupby(...)` – mozway Oct 29 '22 at 12:57

1 Answers1

1

There is not much advantage in "reusing" gr, it's not like reusing a GroupbBy object: g = df.groupby(...) that can be expensive to compute.

Nevertheless, looks like you want:

gr = pd.Grouper(freq='D', offset=pd.Timedelta(hours=6))

g = pd.Series(index=i, dtype='float').groupby(gr)

idx = g.size()

out = idx.repeat(idx).index

As a one-liner (python ≥3.8):

out = (idx:=pd.Series(index=i, dtype='float').groupby(gr).size()).repeat(idx).index

output:

DatetimeIndex(['2021-12-31 06:00:00', '2021-12-31 06:00:00',
               '2021-12-31 06:00:00', '2021-12-31 06:00:00',
               '2021-12-31 06:00:00', '2021-12-31 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-02 06:00:00', '2022-01-02 06:00:00',
               '2022-01-02 06:00:00', '2022-01-02 06:00:00',
               '2022-01-02 06:00:00', '2022-01-02 06:00:00'],
              dtype='datetime64[ns]', freq=None)
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thanks for for answer! The reason for wanting to reuse the grouper is, that it can be specified by the user. And Id like to avoid having to ask the user for >1 parameter. – ElRudi Oct 29 '22 at 13:30
  • I see, then I hope this work as you want ;) – mozway Oct 29 '22 at 13:31
  • Still, I guess I could also use the offset as the user-specified parameter, and create `gr` from *it*. If it makes the flooring easier. What solution did you have in mind? – ElRudi Oct 29 '22 at 13:33
  • I had nothing in particular in mind, I was just wondering the reason for wanting to reuse the grouper. Efficiency wouldn't be a particularly good reason, user input seems a valid one ;) – mozway Oct 29 '22 at 13:40
  • One more question. To do this for a single timestamp, I'd have to first create an index, then a series, then groupby, and then take the first element of the resulting index? Or is there som shortcut? – ElRudi Oct 30 '22 at 07:28
  • Given the grouper uses the first value as reference, this seems the most straightforward I guess – mozway Oct 30 '22 at 07:48
  • Hey @mozway, I've implemented your solution, but there is still one issue: if I try to floor to months instead of days, I get the incorrect solution. I am using `MS` instead of `D` in the definition of `gr`, and my expected outcome is a `DatetimeIndex` with 6 values `2021-12-01 06:00` and 30 values `2022-01-01 06:00`. Currently, I get 36 values `2022-01-01`. I'm looking for a flooring function where H -> D -> M gives me the same result as H -> M. (Let me know if you prefer I open another question.) – ElRudi Nov 02 '22 at 16:31
  • I've created a new question [here](https://stackoverflow.com/q/74401212/2302262), in case you want to have a look – ElRudi Nov 16 '22 at 09:24