8

I have strings like '03-21-2019' that I want to convert to the native Python datetime object: that is, of the datetime.datetime type. The conversion is easy enough through pandas:

import pandas as pd
import datetime as dt

date_str = '03-21-2019'
pd_Timestamp = pd.to_datetime(date_str)
py_datetime_object = pd_Timestamp.to_pydatetime()
print(type(py_datetime_object))

with the result

<class 'datetime.datetime'>

This is precisely what I want, since I want to compute timedelta's by subtracting one of these from another - perfectly well-defined in the native Python datetime.datetime class. However, my data is in a pd.DataFrame. When I try the following code:

import pandas as pd
import datetime as dt

df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(lambda x:
                              pd.to_datetime(x).to_pydatetime())
print(type(df['Date'].iloc[0]))

the result is

<class 'pandas._libs.tslibs.timestamps.Timestamp'>

This is the WRONG type, and I can't for the life of me figure out why only part of the lambda expression is getting evaluated (that is, string-to-pandas-Timestamp), and not the last part (that is, pandas-Timestamp-to-datetime.datetime). It doesn't work if I define the function explicitly, either, instead of using a lambda expression:

import pandas as pd
import datetime as dt


def to_native_datetime(date_str: str) -> dt.datetime:
    return pd.to_datetime(date_str).to_pydatetime()


df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(to_native_datetime)
print(type(df['Date'].iloc[0]))

The result is the same as before. It's definitely doing part of the function, as the result is not a string anymore. But I want the native Python datetime.datetime object, and I see no way of getting it. This looks like a bug in pandas, but I'm certainly willing to see it as user error on my part.

Why can't I get the native datetime.datetime object out of a pandas.DataFrame string column?

I have looked at this thread and this one, but neither of them answer my question.

[EDIT]: Here's something even more bizarre:

import pandas as pd
import datetime as dt


def to_native_datetime(date_str: str) -> dt.datetime:
    return dt.datetime.strptime(date_str, '%m-%d-%Y')


df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(to_native_datetime)
print(type(df['Date'].iloc[0]))

Here I'm not even using pandas to convert the string, and I STILL get a

<class 'pandas._libs.tslibs.timestamps.Timestamp'>

out of it!

Many thanks for your time!

[FURTHER EDIT]: Apparently, in this thread, in Nehal J Wani's answer, it comes out that pandas automatically converts back to its native datetime format when you assign into a pd.DataFrame. This is not what I wanted to hear, but apparently, I'm going to have to convert on-the-fly when I read out of the pd.DataFrame.

Adrian Keister
  • 842
  • 3
  • 15
  • 33
  • I am facing the same issue. Weirdly the pandas object index will maintain the datetime.datetime object but not a Series or column in the DataFrame. – Rookie Apr 03 '20 at 03:02

2 Answers2

6

Thanks to the Sarah Messer’s answer and this one, I could solve the problem by reassigning the array back to the dataframe and forcing its dtype to object:

arr_date = df['Date'].dt.to_pydatetime()
df['Date']= pd.Series(arr_date, dtype=object)

example:

import pandas as pd
from datetime import datetime

df = pd.DataFrame({"date": [datetime(2021, 8, 28, 4, 10), datetime(2021, 8, 28, 4, 10)]})
df.dtypes

#   date    datetime64[ns]
#   dtype: object

arr_date = df["date"].dt.to_pydatetime()
df["date"] = pd.Series(arr_date, dtype="object")
df.dtypes

#   date    object
#   dtype: object

df.iloc[0,0]

# datetime.datetime(2021, 8, 28, 4, 10)
Lionel Hamayon
  • 1,240
  • 15
  • 25
2

Depending on what your actual goal is, you've a couple options you didn't mention directly.

1) If you have a static datetime object or a column of (pandas) Timestamps, and you're willing to deal with the Pandas version of a Timedelta (pandas._libs.tslibs.timedeltas.Timedelta), you can do the subtraction directly in pandas:

df = pd.DataFrame(columns=['Date'])
df.loc[0] = [pd.to_datetime('03-21-2019')]
df.loc[:, 'Offset'] = pd.Series([datetime.now()])
df.loc[:, 'Diff1'] = df['Offset'] - df['Date']
df.loc[:, 'Diff2'] = df['Date'] - datetime.now()

2) If you don't care about Dataframes, but are willing to deal with lists / numpy arrays, you can convert the datetimes to python-native datetimes by operating on the series rather than on individual elements. Below, arr is a numpy.ndarray of datetime.datetime objects. You can change it to a regular list of datetime with list(arr):

arr = df['Date'].dt.to_pydatetime()
Sarah Messer
  • 3,592
  • 1
  • 26
  • 43