Keep only date part when using pandas.to_datetime

Question

I use pandas.to_datetime to parse the dates in my data. Pandas by default represents the dates with datetime64[ns] even though the dates are all daily only. I wonder whether there is an elegant/clever way to convert the dates to datetime.date or datetime64[D] so that, when I write the data to CSV, the dates are not appended with 00:00:00. I know I can convert the type manually element-by-element:

[dt.to_datetime().date() for dt in df.dates]

But this is really slow since I have many rows and it sort of defeats the purpose of using pandas.to_datetime. Is there a way to convert the dtype of the entire column at once? Or alternatively, does pandas.to_datetime support a precision specification so that I can get rid of the time part while working with daily data?

score 591 · Answer 1 · edited Mar 10 '23 at 16:55

591

Since version 0.15.0 this can now be easily done using .dt to access just the date component:

df['just_date'] = df['dates'].dt.date

The above returns datetime.date, so object dtype. If you want to keep the dtype as datetime64 then you can just normalize:

df['normalised_date'] = df['dates'].dt.normalize()

This sets the time component to midnight, i.e. 00:00:00, but the display shows just the date value.

pandas.Series.dt

edited Mar 10 '23 at 16:55

wjandrea

28,235
9
60
81

answered Dec 14 '15 at 22:07

EdChum

376,765
198
813
562

@edChum any way to format date in MM-DD-YYYY format? – Rrptm Apr 30 '21 at 04:50
5

Can only use .dt accessor with datetimelike values – huang Jul 03 '21 at 01:57

score 90 · Answer 2 · answered May 20 '19 at 10:11

90

Simple Solution:

df['date_only'] = df['date_time_column'].dt.date

answered May 20 '19 at 10:11

Gil Baggio

13,019
3
48
37

54

Just a warning, this changes the type to object. So you'd need to astype('datetime64') to keep consistency. – misantroop May 13 '20 at 01:46

score 39 · Answer 3 · edited Mar 10 '23 at 17:02

39

While I upvoted EdChum's answer, which is the most direct answer to the question the OP posed, it does not really solve the performance problem (it still relies on python datetime objects, and hence any operation on them will be not vectorized - that is, it will be slow).

A better performing alternative is:

df['dates'].dt.floor('d')

Strictly speaking, it does not "keep only date part", since it just sets the time to 00:00:00. But it does work as desired by the OP when, for instance:

printing to screen
saving to csv
using the column to groupby

... and it is much more efficient, since the operation is vectorized.

EDIT: in fact, the answer the OP's would have preferred is probably "recent versions of pandas do not write the time to csv if it is 00:00:00 for all observations".

edited Mar 10 '23 at 17:02

wjandrea

28,235
9
60
81

answered Jan 18 '17 at 11:57

Pietro Battiston

7,930
3
42
45

Unfortunately `to_json` still writes the full `00:00:00`. – IanS May 19 '17 at 13:40
@IanS do you mean when using ``date_format='iso'``?! By default, it just outputs seconds since epoch. – Pietro Battiston May 20 '17 at 06:40
Yes, that's what I meant. – IanS May 23 '17 at 08:54
This is faster than `dt.normalize()` on series longer than a few hundred elements. – C8H10N4O2 Jul 21 '17 at 20:20
1

On a df of >1 million rows, this was far faster than any of the other options I found, with the exception of `normalize` which was only slightly slower. In addition, keeping the date pandas-native means it can be saved to hdf stores (datetime.date columns cannot be as of pandas 1.1.4). – fantabolous Nov 30 '20 at 12:45

score 23 · Answer 4 · answered Dec 14 '18 at 00:19

Pandas v0.13+: Use `to_csv` with `date_format` parameter

Avoid, where possible, converting your datetime64[ns] series to an object dtype series of datetime.date objects. The latter, often constructed using pd.Series.dt.date, is stored as an array of pointers and is inefficient relative to a pure NumPy-based series.

Since your concern is format when writing to CSV, just use the date_format parameter of to_csv. For example:

df.to_csv(filename, date_format='%Y-%m-%d')

See Python's strftime directives for formatting conventions.

score 22 · Answer 5 · edited Mar 14 '18 at 22:53

22

This is a simple way to extract the date:

import pandas as pd

d='2015-01-08 22:44:09' 
date=pd.to_datetime(d).date()
print(date)

edited Mar 14 '18 at 22:53

Nic Scozzaro

6,651
3
42
46

answered Dec 11 '17 at 12:02

Mani Abi Anand

1,245
11
10

The OP is already using the .date() method in their question, so this solution doesn't answer their question, but I did find it useful to see a simple example of using the date() method just as a reference. – Nic Scozzaro Mar 14 '18 at 22:58
1

this doesn't seem to work whne used as a lambda function for the date parser i.e. date_parser=lambda col: pd.to_datetime(col, format="%y%m%d").date() – rdmtinez Feb 25 '21 at 19:37

score 21 · Answer 6 · edited Oct 27 '18 at 19:38

21

Pandas DatetimeIndex and Series have a method called normalize that does exactly what you want.

You can read more about it in this answer.

It can be used as ser.dt.normalize()

edited Oct 27 '18 at 19:38

Asclepius

57,944
17
167
143

answered Nov 02 '16 at 12:31

j08lue

1,647
2
21
37

score 16 · Answer 7 · edited Mar 10 '23 at 17:00

df['Column'] = df['Column'].dt.strftime('%m/%d/%Y')

This will give you just the dates and NO TIME at your desired format. You can change format according to your need '%m/%d/%Y'. It will change the data type of the column to 'object'.

If you want just the dates and DO NOT want time in YYYY-MM-DD format use :

df['Column'] = pd.to_datetime(df['Column']).dt.date

The datatype will be 'object'.

For 'datetime64' datatype, use:

df['Column'] = pd.to_datetime(df['Column']).dt.normalize()

Katekarin · Answer 8 · 2019-11-17T19:21:43.093

15

Just giving a more up to date answer in case someone sees this old post.

Adding "utc=False" when converting to datetime will remove the timezone component and keep only the date in a datetime64[ns] data type.

pd.to_datetime(df['Date'], utc=False)

You will be able to save it in excel without getting the error "ValueError: Excel does not support datetimes with timezones. Please ensure that datetimes are timezone unaware before writing to Excel."

edited Nov 17 '19 at 19:21

answered Nov 17 '19 at 19:16

Katekarin

294
2
8

1

This for some reason fails after you apply any aggregate function with the column. – RaphX Jun 07 '20 at 06:49
1

@Katekarin if you try and see df['Date'].values its will still have the time component present in it. – Tushar Tiwari Jul 02 '21 at 05:21

score 6 · Answer 9 · answered Apr 23 '13 at 20:07

6

Converting to datetime64[D]:

df.dates.values.astype('M8[D]')

Though re-assigning that to a DataFrame col will revert it back to [ns].

If you wanted actual datetime.date:

dt = pd.DatetimeIndex(df.dates)
dates = np.array([datetime.date(*date_tuple) for date_tuple in zip(dt.year, dt.month, dt.day)])

answered Apr 23 '13 at 20:07

Dale

4,480
1
19
13

3

If you are using astype('M8[D]') it transforms missing values into the origin date, 1970-1-1. Probably better to just use pandas.to_datetime() nowadays. – Stewbaca Jul 20 '16 at 16:05
1

Note to anyone who routinely includes the datetime module as `dt`, this answer snipet will overwrite that module! @Dale-Jung, perhaps could change the line to something like dt_index – yeliabsalohcin Jul 01 '19 at 13:46
I'm also finding an issue whereby the next time I try and add a new row via the `df.loc[date]` method, the index reverts back to a timestamp, meaning subsequent comparisons no longer work – yeliabsalohcin Jul 01 '19 at 14:03

score 5 · Answer 10 · answered May 04 '20 at 22:20

I wanted to be able to change the type for a set of columns in a data frame and then remove the time keeping the day. round(), floor(), ceil() all work

df[date_columns] = df[date_columns].apply(pd.to_datetime)
df[date_columns] = df[date_columns].apply(lambda t: t.dt.floor('d'))

fantabolous · Answer 11 · 2020-11-30T13:13:38.037

On tables of >1000000 rows I've found that these are both fast, with floor just slightly faster:

df['mydate'] = df.index.floor('d')

or

df['mydate'] = df.index.normalize()

If your index has timezones and you don't want those in the result, do:

df['mydate'] = df.index.tz_localize(None).floor('d')

df.index.date is many times slower; to_datetime() is even worse. Both have the further disadvantage that the results cannot be saved to an hdf store as it does not support type datetime.date.

Note that I've used the index as the date source here; if your source is another column, you would need to add .dt, e.g. df.mycol.dt.floor('d')

score 1 · Answer 12 · answered Sep 08 '20 at 09:45

1

This worked for me on UTC Timestamp (2020-08-19T09:12:57.945888)

for di, i in enumerate(df['YourColumnName']):
    df['YourColumnName'][di] = pd.Timestamp(i)

answered Sep 08 '20 at 09:45

Punnerud

7,195
2
54
44

score 0 · Answer 13 · answered Dec 14 '22 at 00:03

0

If the column is not already in datetime format:

df['DTformat'] = pd.to_datetime(df['col'])

Once it's in datetime format you can convert the entire column to date only like this:

df['DateOnly'] = df['DTformat'].apply(lambda x: x.date())

answered Dec 14 '22 at 00:03

sparrow

10,794
12
54
74

Keep only date part when using pandas.to_datetime

13 Answers13

Pandas v0.13+: Use `to_csv` with `date_format` parameter

Linked

Related

Keep only date part when using pandas.to_datetime

13 Answers13

Pandas v0.13+: Use to_csv with date_format parameter

Linked

Related

Pandas v0.13+: Use `to_csv` with `date_format` parameter