I have the following dataframe with two dates where one of them has the timezone included.
df = pd.DataFrame(np.array([[10, "2021-06-13 12:08:52.311 UTC", "2021-03-29 12:44:33.468"],
[36, "2019-12-07 12:18:02.311 UTC", "2011-10-15 10:14:32.118"]
]),
columns=['col1', 'date1', 'date2'])
df
Here's how I am converting them from a string to datetime:
df["date1"]= pd.to_datetime(df["date1"])
df["date2"]= pd.to_datetime(df["date2"])
which returns:
col1 date1 date2
0 10 2021-06-13 12:08:52.311000+00:00 2021-03-29 12:44:33.468
1 36 2019-12-07 12:18:02.311000+00:00 2011-10-15 10:14:32.118
At some point, I need to compare these two dates to look for the same values. For this, I need them to be written in the same format with the same number of digits. This said, how could I remove the time zone from date1 so it matches the same format in date2?
utc=None
as default in pd.to_datetime
, so that didn't work...
I am assuming both dates are in UTC. In the original data, these are part of different datasets, that's why they have a different format.