0

I am trying to do as my title says. I have a panda dataframe with datetime and timezone.

I am trying to convert the column startDate to the local naive datetime as follow:

This is the piece of code to do that

df['startDate'] = df['startDate'].apply(lambda x: timezone('UTC').localize(x))
df["startDate"] = df.apply(lambda x: x.startDate.astimezone(timezone(x.timezone)), axis=1)
df["startDate"] = df["startDate"].dt.tz_localize(None)

I get this error message.

Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

If I actually precise UTC=True I will get my initial datetime value and that's not what I am trying to achieve

I want to get from this

2020-07-20 20:30:00-07:00
2020-07-21 16:00:00-04:00
2020-07-20 20:30:00-07:00

To this

2020-07-20 20:30:00
2020-07-21 16:00:00
2020-07-20 20:30:00

I am thinking of converting, otherwise, to a string and remove the the 5 last characters and reconverting to a datetime object. However I am looking for a better solution.

Thank you

Solal
  • 611
  • 2
  • 9
  • 26
  • 1. ditch the apply and use `df['startDate'].tz_localize('UTC')` 2. then use `dt.normalize()` or `strftime` methods to format your date as a string. – Umar.H Jul 20 '20 at 09:56
  • does not work it will reconvert my datetime to UTC time and I want to keep it to local time – Solal Jul 20 '20 at 10:13
  • by the way, related: https://stackoverflow.com/q/38531317/10197418 – FObersteiner Jul 24 '20 at 15:35

1 Answers1

1

If you read a datetime string with UTC offset like "2020-07-20 20:30:00-07:00", this will give you a Series of type datetime.datetime (not the pandas datetime64[ns]). So if I get this right, what you want to do is remove the tzinfo. This is basically described here and you can do that like

import pandas as pd

df = pd.DataFrame({'startDate':pd.to_datetime(['2020-07-20 20:30:00-07:00',
                                               '2020-07-21 16:00:00-04:00',
                                               '2020-07-20 20:30:00-07:00'])})
# df['startDate'].iloc[0]
# datetime.datetime(2020, 7, 20, 20, 30, tzinfo=tzoffset(None, -25200))

df['startDate_naive'] = df['startDate'].apply(lambda t: t.replace(tzinfo=None))

# df['startDate_naive']
# 0   2020-07-20 20:30:00
# 1   2020-07-21 16:00:00
# 2   2020-07-20 20:30:00
# Name: startDate_naive, dtype: datetime64[ns]

If you work with timezone aware pandas datetime column, see my answer here on how you can remove the timezone awareness.

FObersteiner
  • 22,500
  • 8
  • 42
  • 72