0

I have some timestamps, and am trying to find the timestamp at the start of the month in which they occur. I found a solution, but it's convoluted and points to unintuitive usage of the methods used. I wonder if there is a better solution.

Here are 3 samples, which should all have the same month start: Timestamp('2020-01-01 00:00:00+0100', tz='Europe/Berlin').

import pandas as pd
stamps = [
    pd.Timestamp("2020-01-01 00:00", tz="Europe/Berlin"),
    pd.Timestamp("2020-01-01 03:00", tz="Europe/Berlin"),
    pd.Timestamp("2020-01-27 18:00", tz="Europe/Berlin"),
]

First attempt:

In:  [ts + pd.offsets.MonthBegin(0) for ts in stamps]
Out: [Timestamp('2020-01-01 00:00:00+0100', tz='Europe/Berlin'),
      Timestamp('2020-01-01 03:00:00+0100', tz='Europe/Berlin'),
      Timestamp('2020-02-01 18:00:00+0100', tz='Europe/Berlin')]

Before addressing anything else: for me, the month starts at midnight, so let's first get rid of the time:

In:  [ts.floor('D') + pd.offsets.MonthBegin(0) for ts in stamps]
Out: [Timestamp('2020-01-01 00:00:00+0100', tz='Europe/Berlin'),
      Timestamp('2020-01-01 00:00:00+0100', tz='Europe/Berlin'),
      Timestamp('2020-02-01 00:00:00+0100', tz='Europe/Berlin')]

Midnight, good, but I don't know why it returns Feb as the month begin of last timestamp. (This behaviour, with the final timestamp being assigned a different month than the other 2, exists for all arguments <= 0.)

Third attempt:

In:  [ts.floor('D') + pd.offsets.MonthBegin(1) for ts in stamps]
Out: [Timestamp('2020-02-01 00:00:00+0100', tz='Europe/Berlin'),
      Timestamp('2020-02-01 00:00:00+0100', tz='Europe/Berlin'),
      Timestamp('2020-02-01 00:00:00+0100', tz='Europe/Berlin')]

Still wrong, but at least they all have the same timestamp. Have to subtract one month:

In:  [ts.floor('D') + pd.offsets.MonthBegin(1) + pd.offsets.MonthBegin(-1) for ts in stamps]
Out: [Timestamp('2020-01-01 00:00:00+0100', tz='Europe/Berlin'),
      Timestamp('2020-01-01 00:00:00+0100', tz='Europe/Berlin'),
      Timestamp('2020-01-01 00:00:00+0100', tz='Europe/Berlin')]

This is what I need. But as I wrote, it seems overly convoluted. My questions:

a) Is there a better/shorter way?

b) Is this the intended behaviour of the MonthBegin attribute? Or could it be a bug?

Many thanks

ElRudi
  • 2,122
  • 2
  • 18
  • 33
  • why bother with pd.offsets, can't you just use `[s.floor('d') - pd.Timedelta(s.day-1, unit='d') for s in stamps]`? - i.e. make a timedelta from the day attribute of the timestamps and subtract that from the timestamp to get the month start? As basically described [here](https://stackoverflow.com/q/42285130/10197418). – FObersteiner Jun 07 '21 at 14:40
  • Thanks for your comment; I see there is a long-running discussion on this very topic. I don't find the solution you propose as readable as `.MonthBegin` - but yours works, and `MonthBegin` doesn't (at least, not without the "add one subtract one" workaround), so it wins nonetheless. Thanks! – ElRudi Jun 07 '21 at 15:30
  • yep, not very pleasing to the eyes ;-) Something like a `MonthStart` method for a Timestamp object would be nicer – FObersteiner Jun 07 '21 at 15:32

0 Answers0