0

In a codebase I am looking at, I see the following

local_timezone = get_local_timezone()
df1["start_time"] = df1.start_time.dt.tz_convert(local_timezone) # pandas dataframe

df_merged = pd.merge(df1, df2, left_on=["start_time"])

df_merged["start_time"] = df_merged["start_time"].dt.tz_localize(None)

I've been under the impression that only either one of tz_localize and tz_convert is needed, and not both, to convert a timezone to the local timezone. What is the purpose of using both here?

David
  • 619
  • 2
  • 8
  • 15
  • I only see `tz_localize` being used here ...? What's the time zone used in `df2` ? – FObersteiner Feb 14 '23 at 16:48
  • @FObersteiner. I guess `df2` is already tz-aware with `local_timezone`. – Corralien Feb 14 '23 at 16:53
  • @FObersteiner I'm sorry. The first one should have been `tz_convert`. I just made the change – David Feb 14 '23 at 16:57
  • ok, so `df1` already has aware datetime, which you need to convert to the same timezone as `df2` before the merge. `tz_convert`: convert timezone, `tz_localize`: set timezone or remove it with None. See also [Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone](https://stackoverflow.com/q/16628819/10197418) for the latter. – FObersteiner Feb 14 '23 at 17:02
  • I see, I think that makes sense. If there wasn't a merge, and if you just had `df1["start_time"] = df1.start_time.dt.tz_convert(local_timezone)` and then `df1["start_time"] = df1["start_time"].dt.tz_localize(None)`, does this get redundant, because I think you can just do both ops at the same time IIUC? ` – David Feb 14 '23 at 17:51
  • I wouldn't say one of the steps is redundant; it depends on your application what you need further on. – FObersteiner Feb 14 '23 at 18:12
  • Btw. `tz_convert` never changes the internal representation of the date/time. Only `tz_localize(None)` does that (see [here](https://stackoverflow.com/a/62656878/10197418) for example). I think Corraliens' answer is not accurate in that regard. – FObersteiner Feb 14 '23 at 18:27

1 Answers1

0

Update after your edit:

You probably have 2 tz-aware dataframe but not in the same timezone. Maybe your first dataframe is UTC while the second one is local_timezone.

To allow merging on start_time column, the dates should be aligned (same date, same time, same timezone). Once the merge is done, you can safely remove the timezone information just for more aesthetic display... (avoid +XX:XX)


You probably have dataframes:

  • df1 with tz-naive datetime (with missing +XX:XX)
  • df2 with tz-aware datetime (with missing +XX:XX where +XX:XX is local_timezone

To allow merging on start_time column, the dates should be aligned (same date, same time, same timezone). Once the merge is done, you can remove the timezone and restore tz-naive datetime. That's why you have used tz_localize twice.

Using tz_localize doesn't modify the datetime but just append the information about timezone. Using tz_convert for already tz-aware datetime change the timezone AND the datetime.

Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Sorry, there was a typo in the OP previously. The first one should have been `tz_convert` and not `tz_localize`. I believe this will change your answer – David Feb 14 '23 at 16:58
  • @David. I updated my answer with this new information. I also left my previous answer. – Corralien Feb 14 '23 at 17:12