1

I have the following dataframe with two dates where one of them has the timezone included.

df = pd.DataFrame(np.array([[10, "2021-06-13 12:08:52.311 UTC", "2021-03-29 12:44:33.468"], 
                            [36, "2019-12-07 12:18:02.311 UTC", "2011-10-15 10:14:32.118"]
                           ]),
                   columns=['col1', 'date1', 'date2'])
df

Here's how I am converting them from a string to datetime:

df["date1"]= pd.to_datetime(df["date1"])
df["date2"]= pd.to_datetime(df["date2"])

which returns:

   col1  date1                              date2
0   10  2021-06-13 12:08:52.311000+00:00    2021-03-29 12:44:33.468
1   36  2019-12-07 12:18:02.311000+00:00    2011-10-15 10:14:32.118

At some point, I need to compare these two dates to look for the same values. For this, I need them to be written in the same format with the same number of digits. This said, how could I remove the time zone from date1 so it matches the same format in date2?

utc=None as default in pd.to_datetime, so that didn't work...

I am assuming both dates are in UTC. In the original data, these are part of different datasets, that's why they have a different format.

Joehat
  • 979
  • 1
  • 9
  • 36
  • Use [`localize`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.tz_localize.html#pandas-series-dt-tz-localize) `df["date1"] = pd.to_datetime(df["date1"]).dt.tz_localize(None)` – Henry Ecker Jul 01 '21 at 20:00
  • Does this answer your question? [How to remove timezone from a Timestamp column in a pandas dataframe](https://stackoverflow.com/questions/49198068/how-to-remove-timezone-from-a-timestamp-column-in-a-pandas-dataframe) – Henry Ecker Jul 01 '21 at 20:02

1 Answers1

1

You can do one of two things:

df["date1"]= pd.to_datetime(df["date1"], format="%Y-%m-%d %H:%M:%S.%f UTC")
df["date2"]= pd.to_datetime(df["date2"], format="%Y-%m-%d %H:%M:%S.%f")
>>> df
  col1                   date1                   date2
0   10 2021-06-13 12:08:52.311 2021-03-29 12:44:33.468
1   36 2019-12-07 12:18:02.311 2011-10-15 10:14:32.118

Or:

df["date1"]= pd.to_datetime(df["date1"].str.replace(" UTC", ""))
df["date2"]= pd.to_datetime(df["date2"])
>>> df
 col1                   date1                   date2
0   10 2021-06-13 12:08:52.311 2021-03-29 12:44:33.468
1   36 2019-12-07 12:18:02.311 2011-10-15 10:14:32.118
not_speshal
  • 22,093
  • 2
  • 15
  • 30
  • What if I also wanted to remove the %S.%f from both? just removing them did not work... – Joehat Jul 01 '21 at 20:35
  • @Joehat - `df["date1"] = pd.to_datetime(df["date1"].str.replace(" UTC","").str[:-7])` and `df["date2"] = pd.to_datetime(df["date2"].str[:-7])` – not_speshal Jul 02 '21 at 13:21