4

I have a pandas column that has both PDT and PST datetime values. Example:

PDT/PST
2021-10-29 00:18:38 PDT
2021-10-29 01:08:19 PDT
2021-11-08 19:43:58 PST
2021-11-08 19:56:01 PST

I need to convert these into UTC time zone. Example:

UTC
2021-10-29 07:18:00

A simple answer is appreciated.

FObersteiner
  • 22,500
  • 8
  • 42
  • 72

2 Answers2

2

Use to_datetime with convert strings to datetimes by dateparser.parse:

import dateparser

df['PDT/PST'] = pd.to_datetime(df['PDT/PST'].apply(dateparser.parse), utc=True)
print (df)
                    PDT/PST
0 2021-10-29 07:18:38+00:00
1 2021-10-29 08:08:19+00:00
2 2021-11-09 03:43:58+00:00
3 2021-11-09 03:56:01+00:00

Last add Series.dt.tz_localize with None:

df['PDT/PST'] = (pd.to_datetime(df['PDT/PST'].apply(dateparser.parse), utc=True)
                   .dt.tz_localize(None))
print (df)
              PDT/PST
0 2021-10-29 07:18:38
1 2021-10-29 08:08:19
2 2021-11-09 03:43:58
3 2021-11-09 03:56:01

Solution with replace PST and PDT to -7 or -8 is:

df['PDT/PST'] = (pd.to_datetime(df['PDT/PST']
                                .replace({'PDT':'-07:00','PST':'-08:00'}, regex=True), utc=True)
                  .dt.tz_localize(None))
print (df)
              PDT/PST
0 2021-10-29 07:18:38
1 2021-10-29 08:08:19
2 2021-11-09 03:43:58
3 2021-11-09 03:56:01
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Your output is not in UTC. – Salvin D'souza Dec 03 '21 at 06:51
  • 1
    @SalvinD'souza - Can you test alternative solution? – jezrael Dec 03 '21 at 08:15
  • interestingly, dateparser parses to Python datetime object with fixed UTC offset - I wasn't aware of this (since mixed offsets here, pandas keeps the dtype). In the sense of "correct behavior", I think dateutil's parser does a better job since it at least issues a warning if you feed it with abbreviated tz names (since might be ambiguous). – FObersteiner Dec 03 '21 at 09:04
  • 1
    @jezrael This solution (pd.to_datetime(df['PDT/PST'] .replace({'PDT':'-07:00','PST':'-08:00'}, regex=True), utc=True) .dt.tz_localize(None)) worked flawlessly. Thank you so much! – Salvin D'souza Dec 03 '21 at 11:28
1

Another option: dateutil's parser with tzinfos supplied; then convert to UTC.

import dateutil
pacific_tz = dateutil.tz.gettz("US/Pacific")

df['UTC'] = df['PDT/PST'].apply(dateutil.parser.parse,
                                tzinfos={'PST': pacific_tz,
                                         'PDT': pacific_tz}).dt.tz_convert('UTC')

df['UTC']

0   2021-10-29 07:18:38+00:00
1   2021-10-29 08:08:19+00:00
2   2021-11-09 03:43:58+00:00
3   2021-11-09 03:56:01+00:00
Name: UTC, dtype: datetime64[ns, UTC]

Related: Python strptime() and timezones?

Now you could format to string with a certain format if desired, e.g.

df['UTC'].dt.strftime('%Y-%m-%d %H:%M:%S')

0    2021-10-29 07:18:38
1    2021-10-29 08:08:19
2    2021-11-09 03:43:58
3    2021-11-09 03:56:01
Name: UTC, dtype: object
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • 1
    If performance is important, second solution with replace by `{'PDT':'-07:00','PST':'-08:00'}` is not better? I ask, because you are expert for datetimes (in my opinion ;) ) – jezrael Dec 03 '21 at 08:55
  • 1
    @jezrael yes I think that would be more efficient performance-wise. You just have to know the correct UTC offsets ^^ – FObersteiner Dec 03 '21 at 08:58