How to floor a date to the first date of that month?

Question

I have a pandas DataFrame with index column = date.

Input:

            value
date    
1986-01-31  22.93
1986-02-28  15.46

I want to floor the date to the first day of that month

Output:

            value
date    
1986-01-01  22.93
1986-02-01  15.46

What I tried:

df.index.floor('M')
ValueError: <MonthEnd> is a non-fixed frequency

This is potentially because the df is generated by df = df.resample("M").sum() (The output of this code is the input at the beginning of the question)

I also tried df = df.resample("M", convention='start').sum(). However, it does not work.

I know in R, it is easy to just call floor(date, 'M').

Is performance an issue? I'd consider transforming the values into `datetime` objects for this, but this might be way to costly if you are trying to process millions of objects. — Alfe, Feb 16 '17 at 21:47

score 42 · Answer 1 · answered Apr 17 '18 at 03:52

42

there is a pandas issue about the floor problem

the suggested way is

import pandas as pd
pd.to_datetime(df.date).dt.to_period('M').dt.to_timestamp()

answered Apr 17 '18 at 03:52

Deo Leung

848
9
9

`df.date.dt.to_period('M').dt.to_timestamp()` seems to be sufficient, the initial `pd.to_datetime` is not needed. – Zoltan Mar 15 '19 at 21:16

Vaishali · Answer 2 · 2018-08-28T18:56:13.240

17

You can use timeseries offset MonthBegin

from pandas.tseries.offsets import MonthBegin
df['date'] = pd.to_datetime(df['date']) - MonthBegin(1)

Edit: The above solution does not handle the dates which are already floored to the beginning of the month. Here is an alternative solution.

Here is a dataframe with additional test cases:

            value
date    
1986-01-31  22.93
1986-02-28  15.46
2018-01-01  20.00
2018-02-02  25.00

With timedelta method,

df.index = pd.to_datetime(df.index)
df.index = df.index - pd.to_timedelta(df.index.day - 1, unit='d')


            value
date    
1986-01-01  22.93
1986-02-01  15.46
2018-01-01  20.00
2018-02-01  25.00

edited Aug 28 '18 at 18:56

answered Feb 16 '17 at 21:51

Vaishali

37,545
5
58
86

5

this is the only pandonic approach among all the answers (as a bonus this is vectorized) – Jeff Feb 17 '17 at 18:47
4

There's a bug with this method: it'd translate any date to the beginning of the following month, except the beginning of the month, which stays the same. i.e. 1-1-2018 -> 1-1-2018, but 2-1-2018 -> 1-2-2018... – Mr.WorshipMe Jan 16 '18 at 09:48
The timedelta method is right-on, worked perfectly for my use case, with one modification - adding the 'dt', so changing to 'df.index.dt.day' inside the to_timedelta(). – rocksteady Nov 18 '18 at 23:53

Grr · Answer 3 · 2017-02-16T22:08:02.743

11

This will do the trick and no imports necessary. Numpy has a dtype datetime64 which by default pandas sets to [ns] as seen by checking the dtype. You can change this to month, which will start on the first of the month by accessing the numpy array and changing the type.

df.date = pd.to_datetime(df.date.values.astype('datetime64[M]'))

It would be nice if pandas would implement this with their own astype() method but unfortunately you cannot.

The above works for data as datetime values or strings, if you already have your data as datetime[ns] type you can omit the pd.to_datetime() and just do:

df.date = df.date.values.astype('datetime64[M]')

edited Feb 16 '17 at 22:08

answered Feb 16 '17 at 21:59

Grr

15,553
7
65
85

1

I don’t know in which version they imported that to the pandas astype, but currently `df.date.astype('datetime64[M]')` works (version 1.2.2 at least). You could update this answer. – Cimbali Jun 25 '21 at 14:01

score 8 · Answer 4 · answered Apr 15 '18 at 15:16

8

Here's another 'pandonic' way to do it:

df.date - pd.Timedelta('1 day') * (df.date.dt.day - 1)

answered Apr 15 '18 at 15:16

aldanor

3,371
2
26
26

1

This is very nice! Plus, it works with dask! (in contrast to Deo Leung's answer) – srs Aug 28 '18 at 14:54
This also works well with variable instances of Timestamps in addition to being vectorized. Just change the `df.date` with your Timestamp and it works great! – cheevahagadog Mar 14 '19 at 19:46

score 8 · Answer 5 · answered Jan 03 '19 at 21:50

8

You can also use string datetime formating:

df['month'] = df['date'].dt.strftime('%Y-%m-01')

answered Jan 03 '19 at 21:50

Mikhail Venkov

358
2
11

George Pipis · Answer 6 · 2022-01-18T09:41:01.717

Assume that you are dealing with the following data frame:

import pandas as pd

df = pd.DataFrame({'MyDate': ['2021-03-11', '2021-04-26', '2021-01-17']})
df['MyDate'] = pd.to_datetime(df.MyDate)

Which is:

    MyDate
0   2021-03-11
1   2021-04-26
2   2021-01-17

And you want to truncate the date to month:

df['Truncated'] = df['MyDate'] + pd.offsets.MonthBegin(-1)
# OR    
# df['Truncated'] = df['MyDate'] - pd.offsets.MonthBegin(1)
df

And you get:

      MyDate  Truncated
0 2021-03-11 2021-03-01
1 2021-04-26 2021-04-01
2 2021-01-17 2021-01-01

Important note: This approach DOES NOT work when the date is already floored to the first day of the month, For that reason we will provide other solutions too.

import pandas as pd

df = pd.DataFrame({'MyDate': ['2021-03-11', '2021-04-26', '2021-01-17', '2021-02-01']})
df['MyDate'] = pd.to_datetime(df.MyDate)

df['Truncated'] = df['MyDate'].dt.to_period('M').dt.to_timestamp()
print(df)

And you get:

 MyDate  Truncated
0 2021-03-11 2021-03-01
1 2021-04-26 2021-04-01
2 2021-01-17 2021-01-01

Finally, another approach could be the following:

df['Truncated'] = df['MyDate'].dt.strftime('%Y-%m-01')
print(df)

And you get:

    MyDate   Truncated
0 2021-03-11  2021-03-01
1 2021-04-26  2021-04-01
2 2021-01-17  2021-01-01

your code won't work if date is first day of the month - it would became first day of the previous month — snowboi, Jan 17 '22 at 21:19

score 2 · Answer 7 · edited Feb 16 '17 at 22:19

2

dt_1 = "2016-02-01"
def first_day(dt):
    lt_split = dt.split("-")
    return "-".join([lt_split[0], lt_split[1], "01"])

print first_day(dt_1)

For Panda's DataFrame, you can use dt["col_name_date"].apply(first_day).

edited Feb 16 '17 at 22:19

garrettmurray

3,338
1
25
23

answered Feb 16 '17 at 21:58

W.Li

21
3

score 1 · Answer 8 · edited Jun 20 '20 at 09:12

1

From August 2019:

This should work:

[x.replace(day=1).date() for x in df['date']]

Only requirement is to make sure date is a datetime, which we can guarantee with a call to pd.to_datetime(df['date'])

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 16 '19 at 11:50

Yuca

6,010
3
22
42

score 1 · Answer 9 · answered Jul 12 '20 at 18:36

Liked Mikhail Venkov answer. Added below code to have the column added as a timestamp value and keep timezone information

    df['month'] = pd.to_datetime(df['timestamp'].dt.strftime('%Y-%m-01')).dt.tz_localize(timezone)

where timezone = 'America/Los_Angeles' or whatever zone you want

score 0 · Answer 10 · answered Mar 15 '22 at 17:48

One liner

df.set_index(
    df.index - pd.to_timedelta(pd.to_datetime(df.index).day - 1, unit="D")
)

(there is a warning about vectorization.)

Actually what I would rather do is to first make that index a proper column, do the operations on it and then make it an index again:

In [32]: df = pd.DataFrame(
                index=[datetime.date.fromisoformat("1986-01-31"), datetime.date.fromisoformat("1986-02-28")], 
                data={"value":[22.93, 15.46]})

In [33]: df
Out[33]:
            value
1986-01-31  22.93
1986-02-28  15.46

In [34]: df2 = df.reset_index().rename(columns={"index": "date"})

In [35]: df2
Out[35]:
         date  value
0  1986-01-31  22.93
1  1986-02-28  15.46

In [36]: df2.date = pd.to_datetime(df2.date)

In [37]: df2.date
Out[37]:
0   1986-01-31
1   1986-02-28
Name: date, dtype: datetime64[ns]

In [38]: df2.date -= pd.to_timedelta(df2.date.dt.day - 1, unit="D")

In [39]: df2
Out[39]:
        date  value
0 1986-01-01  22.93
1 1986-02-01  15.46

In [40]: df2.set_index("date")
Out[40]: 
            value
date             
1986-01-01  22.93
1986-02-01  15.46

The most important line is:

df2.date -= pd.to_timedelta(df2.date.dt.day - 1, unit="D")

where you actually remove the day offset.

score 0 · Answer 11 · answered Apr 30 '23 at 19:06

I needed both floor and ceiling functions. The following worked for me:

import pandas as pd

def monthfloor(ts):
    offset = pd.offsets.Hour(1)

    if offset.is_month_start(ts):
        return ts
    else:
        return ts - pd.offsets.MonthBegin()

def monthceil(ts):
    offset = pd.offsets.Hour(1)

    if offset.is_month_end(ts):
        return ts
    else:
        return ts + pd.offsets.MonthEnd()

begin = pd.Timestamp(2023, 4, 1)
middle = pd.Timestamp(2023, 4, 15)
end = pd.Timestamp(2023, 4, 30)

tss = pd.Series([begin, middle, end])

print(tss.map(monthfloor))
# 0   2023-04-01
# 1   2023-04-01
# 2   2023-04-01
# dtype: datetime64[ns]

print(tss.map(monthceil))
# 0   2023-04-30
# 1   2023-04-30
# 2   2023-04-30
# dtype: datetime64[ns]

How to floor a date to the first date of that month?

11 Answers11

From August 2019:

Linked