3

I have a dataframe with a Timestamp column. I want to convert it to datetime.datetime format. This is what I have tried:

import pandas as pd

ts = pd.Timestamp('2019-01-01 00:00:00', tz=None)
df = pd.DataFrame({"myDate": [ts]})

df["myDate"] = df["myDate"].dt.to_pydatetime()
myList = df["myDate"].dt.to_pydatetime()

print(df.dtypes)
print(type(myList[0]))

The first print() returns a Timestamp (unexpected) The second print() returns datetime (expected) How do I make this dataframe re-assignment persist?

* Edit: What I am trying to achieve * To compare Timestamps in the dataframe with datetimes in a list, as follows:

ts = pd.Timestamp('2019-01-01 00:00:00', tz=None)
df = pd.DataFrame({"my_date": [ts]})
df_set = set(df["my_date"].values)
dt_set = set([datetime(2019, 1, 1, 0, 0, 0)])
print(dt_set - df_set)

returns: {datetime.datetime(2019, 1, 1, 0, 0)}. Should be empty set.

GlaceCelery
  • 921
  • 1
  • 13
  • 30
  • I don't understand your point....I need to compare the datetime index of my dataframe with a list of dates elsewhere in my code – GlaceCelery Jan 14 '19 at 01:35
  • You don't need `to_pydatetime()` for this. I suggest you update your question to explain what you are trying to achieve overall. – jpp Jan 14 '19 at 01:36

1 Answers1

2

You can use pd.DatetimeIndex and its difference method. In general, using set with Pandas / NumPy objects is inefficient. Related: Pandas pd.Series.isin performance with set versus array.

from datetime import datetime

df = pd.DataFrame({"my_date": [pd.Timestamp('2019-01-01 00:00:00', tz=None),
                               pd.Timestamp('2019-01-10 00:00:00', tz=None)]})

datetime_list = [datetime(2019, 1, 1, 0, 0, 0)]

diff = pd.DatetimeIndex(df['my_date']).difference(pd.DatetimeIndex(datetime_list))

# DatetimeIndex(['2019-01-10'], dtype='datetime64[ns]', freq=None)
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thank you. Now to_pydatetime() works on this result. Do you know why I can't assign this to my dataframe. i.e what's wrong with: df["myDate"] = df["myDate"].dt.to_pydatetime() – GlaceCelery Jan 14 '19 at 02:09
  • @GlaceCelery, I haven't looked into the Pandas internals, but [this answer](https://stackoverflow.com/a/49758140/9209546) suggests the rationale.. `pd.Timestamp` offers a superset of `datetime.datetime` functionality, and a whole lot more efficient since a NumPy `int64` array is used in the background. – jpp Jan 14 '19 at 02:16