1

I'm working with datetime columns in pandas and since it sometimes happen I have a dataframe that contains datetime columns in different timezones I would like to remove timezone from all of them to easily work with them.

Let's say I have following dataframe:

import pandas as pd

data = pd.DataFrame(
    {
        "date_no_tz": pd.date_range("2021-01-01", freq="D", periods=10),
        "date_utc": pd.date_range("2021-01-01", freq="D", periods=10, tz="UTC"),
        "date_with_tz": pd.date_range("2021-01-01", freq="D", periods=10, tz="America/Los_Angeles"),
    }
)

data.dtypes
# date_no_tz                           datetime64[ns]
# date_utc                        datetime64[ns, UTC]
# date_with_tz    datetime64[ns, America/Los_Angeles]

I can easily remove timezone info with tz_convert method, for example data.date_utc.dt.tz_convert(None). I would like to use some simple solution that would ensure that all my datetime columns are without timezone, e.g. data.apply(lambda x: x.dt.tz_convert(None)). But this obviously throws an error "TypeError: Cannot convert tz-naive timestamps, use tz_localize to localize".

It seems that the best way is to use some simple function with check for timezone like following solution:

def convert_datetime(dt_col):
    if dt_col.dt.tz is not None:
        return dt_col.dt.tz_convert(None)
    return dt_col

data.apply(convert_datetime).dtypes
# date_no_tz      datetime64[ns]
# date_utc        datetime64[ns]
# date_with_tz    datetime64[ns]

Is there any better solution (without defining custom functions) or do I always have to come up with custom solution for such easy task?

The advantage of custom solution is that I can also simply add single line check for datetime columns (e.g. using pd.api.types.is_datetime64_any_dtype), therefore I could use such function on the whole dataframe even if there are none-datetime columns. But I'm always a bit struggling with using tz_localize and tz_convert and would prefer if there is something simple for doing such conversion.

Nerxis
  • 3,452
  • 2
  • 23
  • 39
  • If it has advantages and is simple, why not use the custom function? I guess these properties are the reason why there is no built-in in the library ;-) – FObersteiner Nov 15 '21 at 20:27
  • Yes and no :) My point is that it's almost necessary to work with datetime columns in the same timezone, and I definitely prefer to work with no timezones. Therefore I feel that some method like "convert datetimes to UTC and remove timezone" is something so common that it could be built-in in the library. – Nerxis Nov 16 '21 at 08:17
  • In my dream world, everybody is working with UTC, so I cannot support this ^^ Note e.g. https://stackoverflow.com/a/62656878/10197418 (it's fine though if you convert to UTC first). – FObersteiner Nov 16 '21 at 08:33
  • Yes, agree (in my dream world it's the same). But since someone use e.g. database with timezone-aware datetimes and someone other without timezone I have to deal with a fact that I have to either convert one to timezone-aware or remove timezone from the other. So maybe more common method would be like "ensure all datetimes in UTC timezone-aware" or "ensure all datetimes in UTC without timezone". I usually do not have data in different timezone but it sometime happens as well. – Nerxis Nov 16 '21 at 08:51

0 Answers0