0

How can I include datetimes into pd.DataFrame?

import pandas as pd
from datetime import datetime

df = pd.DataFrame({"a": ['2002-02-02', '2002-02-03', '2002-02-04']})
df["b"] = df["a"].apply(lambda t: datetime.strptime(t, '%Y-%m-%d'))  # datetime.strptime returns datetime.datetime
print(datetime(2002, 2, 2) in df["b"])

outputs False.

Similarly,

f["c"] = df["b"].apply(lambda t: t.to_pydatetime())
print(datetime(2002, 2, 2) in df["c"])

outputs False.

Note that neither this nor this works. Following any of those approaches, I end up with Timestamps instead of datetimes in the data frame.

I am using Python 3.8.5 and Pandas 1.2.1.

Antoine
  • 862
  • 7
  • 22
  • `datetime(2002, 2, 2) in list(df['b'])`? – Epsi95 Jul 09 '21 at 10:06
  • @Epsi95 True. However, that means that I have to convert this every time. – Antoine Jul 09 '21 at 10:10
  • @Epsi95 But `datetime(2002, 2, 2) in list(pd.to_datetime(df['b']).unique())` is again `False`. – Antoine Jul 09 '21 at 10:20
  • `datetime(2002, 2, 2) in list(pd.to_datetime(pd.to_datetime(df['b']).unique()))` – Epsi95 Jul 09 '21 at 10:54
  • the title of the question confuses me; in `pandas` you'll want to work with the built-in datatype (datetime64 from `numpy`). Note that pandas will auto-convert Python standard lib datetime to it's built-in datatype. Only if you have a pd.Series of type datetime.date or datetime.time, the type won't be modified. – FObersteiner Jul 09 '21 at 11:14

1 Answers1

1

You can see after all your manipulations that all series of datetime obejcts are automatically converted to timestamps when added to the dataframe:

>>> df
            a          b          c
0  2002-02-02 2002-02-02 2002-02-02
1  2002-02-03 2002-02-03 2002-02-03
2  2002-02-04 2002-02-04 2002-02-04
>>> df.dtypes
a            object
b    datetime64[ns]
c    datetime64[ns]
dtype: object

I suggest you use the built-in pandas datetime handling, it’s definitely not much harder than python datetime objects:

>>> pd.Timestamp(2002, 2, 2) in df['b'].to_list()
True
>>> df['b'].eq(pd.Timestamp(2002, 2, 2))
0     True
1    False
2    False
Name: b, dtype: bool
>>> df['b'].eq(pd.Timestamp(2002, 2, 2)).any()
True

Additionally this opens up a wealth of possibilities to further handle dates and times that you can’t do with python datetime objects.

For example you can compare directly str instead of building Timestamp objects:

>>> df['b'].eq('2002-02-02')
0     True
1    False
2    False
Name: b, dtype: bool
Cimbali
  • 11,012
  • 1
  • 39
  • 68