I'm working with datetime columns in pandas and since it sometimes happen I have a dataframe that contains datetime columns in different timezones I would like to remove timezone from all of them to easily work with them.
Let's say I have following dataframe:
import pandas as pd
data = pd.DataFrame(
{
"date_no_tz": pd.date_range("2021-01-01", freq="D", periods=10),
"date_utc": pd.date_range("2021-01-01", freq="D", periods=10, tz="UTC"),
"date_with_tz": pd.date_range("2021-01-01", freq="D", periods=10, tz="America/Los_Angeles"),
}
)
data.dtypes
# date_no_tz datetime64[ns]
# date_utc datetime64[ns, UTC]
# date_with_tz datetime64[ns, America/Los_Angeles]
I can easily remove timezone info with tz_convert
method, for example data.date_utc.dt.tz_convert(None)
. I would like to use some simple solution that would ensure that all my datetime columns are without timezone, e.g. data.apply(lambda x: x.dt.tz_convert(None))
. But this obviously throws an error "TypeError: Cannot convert tz-naive timestamps, use tz_localize to localize".
It seems that the best way is to use some simple function with check for timezone like following solution:
def convert_datetime(dt_col):
if dt_col.dt.tz is not None:
return dt_col.dt.tz_convert(None)
return dt_col
data.apply(convert_datetime).dtypes
# date_no_tz datetime64[ns]
# date_utc datetime64[ns]
# date_with_tz datetime64[ns]
Is there any better solution (without defining custom functions) or do I always have to come up with custom solution for such easy task?
The advantage of custom solution is that I can also simply add single line check for datetime columns (e.g. using pd.api.types.is_datetime64_any_dtype
), therefore I could use such function on the whole dataframe even if there are none-datetime columns. But I'm always a bit struggling with using tz_localize
and tz_convert
and would prefer if there is something simple for doing such conversion.