Extracting just Month and Year separately from Pandas Datetime column

Question

I have a Dataframe, df, with the following column:

df['ArrivalDate'] =
...
936   2012-12-31
938   2012-12-29
965   2012-12-31
966   2012-12-31
967   2012-12-31
968   2012-12-31
969   2012-12-31
970   2012-12-29
971   2012-12-31
972   2012-12-29
973   2012-12-29
...

The elements of the column are pandas.tslib.Timestamp.

I want to just include the year and month. I thought there would be simple way to do it, but I can't figure it out.

Here's what I've tried:

df['ArrivalDate'].resample('M', how = 'mean')

I got the following error:

Only valid with DatetimeIndex or PeriodIndex

Then I tried:

df['ArrivalDate'].apply(lambda(x):x[:-2])

I got the following error:

'Timestamp' object has no attribute '__getitem__'

Any suggestions?

Edit: I sort of figured it out.

df.index = df['ArrivalDate']

Then, I can resample another column using the index.

But I'd still like a method for reconfiguring the entire column. Any ideas?

the best answer is clearly.. df['mnth_yr'] = df.date_column.dt.to_period('M') as below from @jaknap32 — ihightower, Jun 23 '17 at 06:20
You don't even have to do `to_period`: `df.date_column.dt.month` (or `.year`, or `.day`) works — elz, Nov 21 '17 at 19:27
Possible duplicate of [python pandas extract year from datetime --- df\['year'\] = df\['date'\].year is not working](https://stackoverflow.com/questions/30405413/python-pandas-extract-year-from-datetime-dfyear-dfdate-year-is-not) — Code-Apprentice, Mar 11 '18 at 20:03
@elphz: `.dt.month` loses the year though. And `.dt.to_period('M')` changes the data type to something which is not a datetime64 anymore. I ended up using [Juan's answer](https://stackoverflow.com/a/43561379/1026) suggesting `.astype('datetime64[M]')` to truncate the values. — Nickolay, May 27 '18 at 23:09

KieranPC · Accepted Answer · 2015-05-03T20:17:59.537

592

If you want new columns showing year and month separately you can do this:

df['year'] = pd.DatetimeIndex(df['ArrivalDate']).year
df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month

or...

df['year'] = df['ArrivalDate'].dt.year
df['month'] = df['ArrivalDate'].dt.month

Then you can combine them or work with them just as they are.

edited May 03 '15 at 20:17

answered Aug 05 '14 at 22:18

KieranPC

8,525
7
22
25

12

Is there a way to do this in a single line ? I want to avoid traversing the same column multiple times. – fixxxer Nov 01 '15 at 16:40
4

Some quick benchmarking with `timeit` suggests that the `DatetimeIndex` approach is significantly faster than either `.map/.apply` or `.dt`. – Snorfalorpagus Oct 25 '16 at 09:34
2

the best answer is clearly.. df['mnth_yr'] = df.date_column.dt.to_period('M') as below from @jaknap32 – ihightower Jun 23 '17 at 06:16
1

what actually does pd.Datetimeindex do? – JOHN Apr 16 '18 at 05:24
4

I sometimes do this: `df['date_column_trunc'] = df[date_column'].apply(lambda s: datetime.date(s.year, s.month, 1)` – Stewbaca Jul 30 '18 at 20:59
This doesn't work for me - if I do this, then I get only null (NaN) returned. and if I do `due_month = pd.DatetimeIndex(model_data['Due Date']).month` , then I can't concatenate it to a dataframe. – GenDemo Jun 23 '21 at 06:23
When I use the first method I get a SettingWithCopyWarning. Does anyone know how to fix it? – Koala Nov 21 '21 at 19:36

score 363 · Answer 2 · edited Nov 04 '21 at 10:22

363

The df['date_column'] has to be in date time format.

df['month_year'] = df['date_column'].dt.to_period('M')

You could also use D for Day, 2M for 2 Months etc. for different sampling intervals, and in case one has time series data with time stamp, we can go for granular sampling intervals such as 45Min for 45 min, 15Min for 15 min sampling etc.

edited Nov 04 '21 at 10:22

mkrieger1

19,194
5
54
65

answered May 10 '17 at 14:56

pka32

5,176
1
17
21

14

Note that the resulting column is not of the `datetime64` dtype anymore. Using `df.my_date_column.astype('datetime64[M]')`, as in [@Juan's answer](https://stackoverflow.com/a/43561379/1026) converts to dates representing the first day of each month. – Nickolay May 26 '18 at 19:52

score 165 · Answer 3 · edited Feb 06 '19 at 14:38

You can directly access the year and month attributes, or request a datetime.datetime:

In [15]: t = pandas.tslib.Timestamp.now()

In [16]: t
Out[16]: Timestamp('2014-08-05 14:49:39.643701', tz=None)

In [17]: t.to_pydatetime() #datetime method is deprecated
Out[17]: datetime.datetime(2014, 8, 5, 14, 49, 39, 643701)

In [18]: t.day
Out[18]: 5

In [19]: t.month
Out[19]: 8

In [20]: t.year
Out[20]: 2014

One way to combine year and month is to make an integer encoding them, such as: 201408 for August, 2014. Along a whole column, you could do this as:

df['YearMonth'] = df['ArrivalDate'].map(lambda x: 100*x.year + x.month)

or many variants thereof.

I'm not a big fan of doing this, though, since it makes date alignment and arithmetic painful later and especially painful for others who come upon your code or data without this same convention. A better way is to choose a day-of-month convention, such as final non-US-holiday weekday, or first day, etc., and leave the data in a date/time format with the chosen date convention.

The calendar module is useful for obtaining the number value of certain days such as the final weekday. Then you could do something like:

import calendar
import datetime
df['AdjustedDateToEndOfMonth'] = df['ArrivalDate'].map(
    lambda x: datetime.datetime(
        x.year,
        x.month,
        max(calendar.monthcalendar(x.year, x.month)[-1][:5])
    )
)

If you happen to be looking for a way to solve the simpler problem of just formatting the datetime column into some stringified representation, for that you can just make use of the strftime function from the datetime.datetime class, like this:

In [5]: df
Out[5]: 
            date_time
0 2014-10-17 22:00:03

In [6]: df.date_time
Out[6]: 
0   2014-10-17 22:00:03
Name: date_time, dtype: datetime64[ns]

In [7]: df.date_time.map(lambda x: x.strftime('%Y-%m-%d'))
Out[7]: 
0    2014-10-17
Name: date_time, dtype: object

Performance can be bad, so it's always good to make the best possible use of helper functions, vectorized operations, and `pandas` split-apply-combine techniques. My suggestions above aren't meant to be taken as an endorsement that they are the most performant approaches for your case -- just that they are stylistically valid Pythonic choices for a range of cases. — ely, Aug 05 '14 at 19:03
the best answer is clearly.. df['mnth_yr'] = df.date_column.dt.to_period('M') as below from @jaknap32 — ihightower, Jun 23 '17 at 06:16
You're supposed to multiply by 100 in `df['YearMonth'] = df['ArrivalDate'].map(lambda x: 1000*x.year + x.month)`. — Git Gud, Jun 23 '18 at 20:55

score 49 · Answer 4 · edited Oct 09 '19 at 10:14

49

If you want the month year unique pair, using apply is pretty sleek.

df['mnth_yr'] = df['date_column'].apply(lambda x: x.strftime('%B-%Y'))

Outputs month-year in one column.

Don't forget to first change the format to date-time before, I generally forget.

df['date_column'] = pd.to_datetime(df['date_column'])

edited Oct 09 '19 at 10:14

Jaroslav Bezděk

6,967
6
29
46

answered Aug 29 '16 at 12:28

pka32

5,176
1
17
21

10

You can avoid the lambda function as well: `df['month_year'] = df['date_column'].dt.strftime('%B-%Y')` – Rishabh Mar 22 '20 at 03:15

score 31 · Answer 5 · edited Nov 16 '22 at 02:49

31

SINGLE LINE: Adding a column with 'year-month'-paires: ('pd.to_datetime' first changes the column dtype to date-time before the operation)

df['yyyy-mm'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%Y-%m')

Accordingly for an extra 'year' or 'month' column:

df['yyyy'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%Y')
df['mm'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%m')

edited Nov 16 '22 at 02:49

Henry Ecker

34,399
18
41
57

answered May 24 '20 at 16:07

Matthi9000

1,156
3
16
32

3

.dt.strftime('%Y-%m') is incredibly slow especially on millions of records compared to slicing and adding as in ```.dt.year + "-" + .dt.month``` – Vitalis Sep 18 '20 at 22:07
Alright, that a useful insight. I used it for some files of 100000 rows and it was doing just fine, but it's a useful alternative. – Matthi9000 Sep 19 '20 at 10:42

score 14 · Answer 6 · answered Mar 15 '18 at 08:43

14

Extracting the Year say from ['2018-03-04']

df['Year'] = pd.DatetimeIndex(df['date']).year

The df['Year'] creates a new column. While if you want to extract the month just use .month

answered Mar 15 '18 at 08:43

Douglas

143
1
4

1

Thanks, It has been really helpful date_1 = pd.DatetimeIndex(df['date']) --year = date_1.year # For years-- --month = date_1.month # For months-- --dy = date_1.day # For days-- – Edwin Torres Jun 06 '18 at 16:46

score 13 · Answer 7 · answered Apr 22 '17 at 16:03

13

You can first convert your date strings with pandas.to_datetime, which gives you access to all of the numpy datetime and timedelta facilities. For example:

df['ArrivalDate'] = pandas.to_datetime(df['ArrivalDate'])
df['Month'] = df['ArrivalDate'].values.astype('datetime64[M]')

answered Apr 22 '17 at 16:03

Juan A. Navarro

10,595
6
48
52

2

This worked really well for me, as I was looking for functionality analogous to pyspark's [`trunc`](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=date#pyspark.sql.functions.trunc). Is there any documentation for the `astype('datetime64[M]')` convention? – h1-the-swan Apr 12 '19 at 16:43
I was using 'datetime[M]' as suggested for some time, but as I've updated some libs (pandas to 1.5 and some others) I've noticed that it does not truncate anymore. Now is being converted to a date. – FábioRB Oct 03 '22 at 23:41

jpp · Answer 8 · 2019-11-17T11:25:51.710

@KieranPC's solution is the correct approach for Pandas, but is not easily extendible for arbitrary attributes. For this, you can use getattr within a generator comprehension and combine using pd.concat:

# input data
list_of_dates = ['2012-12-31', '2012-12-29', '2012-12-30']
df = pd.DataFrame({'ArrivalDate': pd.to_datetime(list_of_dates)})

# define list of attributes required    
L = ['year', 'month', 'day', 'dayofweek', 'dayofyear', 'weekofyear', 'quarter']

# define generator expression of series, one for each attribute
date_gen = (getattr(df['ArrivalDate'].dt, i).rename(i) for i in L)

# concatenate results and join to original dataframe
df = df.join(pd.concat(date_gen, axis=1))

print(df)

  ArrivalDate  year  month  day  dayofweek  dayofyear  weekofyear  quarter
0  2012-12-31  2012     12   31          0        366           1        4
1  2012-12-29  2012     12   29          5        364          52        4
2  2012-12-30  2012     12   30          6        365          52        4

score 8 · Answer 9 · edited May 23 '17 at 12:26

8

Thanks to jaknap32, I wanted to aggregate the results according to Year and Month, so this worked:

df_join['YearMonth'] = df_join['timestamp'].apply(lambda x:x.strftime('%Y%m'))

Output was neat:

0    201108
1    201108
2    201108

edited May 23 '17 at 12:26

Community

1
1

answered May 09 '17 at 09:17

Subspacian

2,017
1
15
6

score 4 · Answer 10 · answered Dec 25 '22 at 06:25

4

df['Month_Year'] = df['Date'].dt.to_period('M')

Result :

    Date            Month_Year
0   2020-01-01      2020-01
1   2020-01-02      2020-01
2   2020-01-03      2020-01
3   2020-01-04      2020-01
4   2020-01-05      2020-01

answered Dec 25 '22 at 06:25

Sandeep Agrawal

175
1
8

score 3 · Answer 11 · answered Aug 12 '19 at 13:55

There is two steps to extract year for all the dataframe without using method apply.

Step1

convert the column to datetime :

df['ArrivalDate']=pd.to_datetime(df['ArrivalDate'], format='%Y-%m-%d')

Step2

extract the year or the month using DatetimeIndex() method

 pd.DatetimeIndex(df['ArrivalDate']).year

score 0 · Answer 12 · edited Apr 20 '17 at 17:53

0

df['year_month']=df.datetime_column.apply(lambda x: str(x)[:7])

This worked fine for me, didn't think pandas would interpret the resultant string date as date, but when i did the plot, it knew very well my agenda and the string year_month where ordered properly... gotta love pandas!

edited Apr 20 '17 at 17:53

Chewpers

2,430
5
23
30

answered Apr 20 '17 at 16:53

TICH

17
1

score 0 · Answer 13 · answered May 17 '21 at 03:27

0

Then I tried:

df['ArrivalDate'].apply(lambda(x):x[:-2])

I think here the proper input should be string.

df['ArrivalDate'].astype(str).apply(lambda(x):x[:-2])

answered May 17 '21 at 03:27

Jiaxiang

865
12
23

1

This is a solution that works, but its error prone. best is to use the already available libraries for date and time – Mohammed Nadeem Jun 14 '21 at 11:22

Extracting just Month and Year separately from Pandas Datetime column

13 Answers13

Linked

Related