93

I have a series within a DataFrame that I read in initially as an object, and then need to convert it to a date in the form of yyyy-mm-dd where dd is the end of the month.

As an example, I have DataFrame df with a column Date as an object:

...      Date    ...
...     200104   ...
...     200508   ...

What I want when this is all said and done is a date object:

...      Date    ...
...  2001-04-30  ...
...  2005-08-31  ...

such that df['Date'].item() returns

datetime.date(2001, 04, 30)

I've used the following code to get almost there, but all my dates are at the beginning of the month, not the end. Please advise.

df['Date'] = pd.to_datetime(df['Date'], format="%Y%m").dt.date

Note: I've already imported Pandas (pd), and datetime (dt)

rachwa
  • 1,805
  • 1
  • 14
  • 17
Lisle
  • 1,620
  • 2
  • 16
  • 22

3 Answers3

173

You can use pandas.tseries.offsets.MonthEnd:

from pandas.tseries.offsets import MonthEnd

df['Date'] = pd.to_datetime(df['Date'], format="%Y%m") + MonthEnd(0)

The 0 in MonthEnd just specifies to roll forward to the end of the given month. Note that if we'd used MonthEnd(1), then we'd have got the next date which is at the end of the month. If you wanted the last day of the next month, you'd then add an extra MonthEnd(1), etc. This should work for any month, so you don't need to know the number days in the month, or anything like that. More offset information can be found in the documentation.

Example usage and output:

df = pd.DataFrame({'Date': [200104, 200508, 201002, 201602, 199912, 200611]})
df['EndOfMonth'] = pd.to_datetime(df['Date'], format="%Y%m") + MonthEnd(1)

     Date EndOfMonth
0  200104 2001-04-30
1  200508 2005-08-31
2  201002 2010-02-28
3  201602 2016-02-29
4  199912 1999-12-31
5  200611 2006-11-30
ignoring_gravity
  • 6,677
  • 4
  • 32
  • 65
root
  • 32,715
  • 6
  • 74
  • 87
  • 1
    This is by far one of the most elegant solutions I have ever seen, thank you! :) – Lisle May 20 '16 at 19:51
  • 2
    No problem! `MonthEnd` is one of the gems hidden in Pandas that you wouldn't immediately think Pandas would have. Usually my first instinct is to use other datetime libraries..until I remember Pandas has such cool functionality! – root May 20 '16 at 20:30
  • 8
    you should check out @martien lubberink's answer for some caveats to the above. – evan54 Jun 05 '17 at 00:59
  • 1
    This doesn't seem to work for pandas 0.19.2 and numpy 1.13.1. Received error "data type datetime not understood". Anyone ran into this issue? – iwbabn Jan 01 '18 at 22:40
  • 9
    As explained in the [answer by Martein](https://stackoverflow.com/a/43177530/), the use of `MonthEnd(1)` is incorrect for the last date of a month, and it should always just be `MonthEnd(0)` instead. – Asclepius Dec 04 '20 at 19:18
  • 1
    Can somebody please edit the answer? As mentioned above, we should use ```MonthEnd(0)``` instead of ```MonthEnd(1)```. This question is very common and having wrong answer as the one with the highest rating misleads people. – Moysey Abramowitz Nov 22 '21 at 16:05
92

Agreed that root offers is the right method. However, readers who blindly use MonthEnd(1) are in for a surprise if they use the last date of the month as an input:

In [4]: pd.Timestamp('2014-01-01') + MonthEnd(1)
Out[4]: Timestamp('2014-01-31 00:00:00')

In [5]: pd.Timestamp('2014-01-31') + MonthEnd(1)
Out[5]: Timestamp('2014-02-28 00:00:00')

Using MonthEnd(0) instead gives this:

In [7]: pd.Timestamp('2014-01-01') + MonthEnd(0)
Out[7]: Timestamp('2014-01-31 00:00:00')

In [8]: pd.Timestamp('2014-01-31') + MonthEnd(0)
Out[8]: Timestamp('2014-01-31 00:00:00')

Example to obtain the month end as a string:

from pandas.tseries.offsets import MonthEnd
(pd.Timestamp.now() + MonthEnd(0)).strftime('%Y-%m-%dT00:00:00')
# '2014-01-31T00:00:00'
user3313834
  • 7,327
  • 12
  • 56
  • 99
Martien Lubberink
  • 2,614
  • 1
  • 19
  • 31
1

The end of the month can be the last day/minute/second/millisecond/microsecond/nanosecond of the month depending upon the offset needed by your use case. Given a date, to derive the last unit of the month, use the applicable anchored offset semantics. For example:

import pandas as pd

def last_second_of_month(date: str) -> str:
    return str(pd.Timestamp(date) + pd.offsets.MonthBegin() - pd.offsets.Second())

As needed, replace Second() above with Day(), Minute(), Milli(), Micro(), or Nano().

Here is an alternative implementation with the same result:

import pandas as pd

def last_second_of_month(date: str) -> str:
    return str((pd.Timestamp(date) + pd.offsets.MonthEnd(0)).date()) + " 23:59:59"

Examples:

>>> last_second_of_month('2020-10')
'2020-10-31 23:59:59'
>>> last_second_of_month('2020-10-01')
'2020-10-31 23:59:59'
>>> last_second_of_month('2020-10-15')
'2020-10-31 23:59:59'
>>> last_second_of_month('2020-10-30')
'2020-10-31 23:59:59'
>>> last_second_of_month('2020-10-31')
'2020-10-31 23:59:59'

As a cautionary note, do not use pd.Timestamp(date) + pd.offsets.MonthEnd() + pd.offsets.Day() - pd.offsets.Second() as it doesn't work as required for the last date of a month. This observation about pd.offsets.MonthEnd(1) is credited to the answer by Martien.

Asclepius
  • 57,944
  • 17
  • 167
  • 143