Parsing dates with Pandas: how to take time zones into account?

Question

I have dates in these formats:

Thursday, September 22, 2016 at 11:04am UTC+02
Monday, January 22, 2018 at 6:46pm CST
...

I want to convert them to UNIX timestamps. This pattern works, but it ignores the timezone:

timestamp = pd.to_datetime(date, format='%A, %B %d, %Y at %H:%M%p', exact=False)

I don't see how to take the timezones ("UTC+02, "CST") into account.

This doesn't work:

timestamp = pd.to_datetime(date, format='%A, %B %d, %Y at %H:%M%p %Z')
# ValueError: unconverted data remains: +02

Hmmm, I think it not ignore timezones, only parse to default UTC datetimes. — jezrael, Jan 23 '18 at 08:32
So need convert them manually to timezones if necessary - e.g. by [this solution](https://stackoverflow.com/q/42826388/2901002) — jezrael, Jan 23 '18 at 08:37
@jezrael then I don't understand how it knows it should remove eg 2 hours to get UTC time if it doesn't parse the timezone. Does it use the default python timezone? — MasterScrat, Jan 23 '18 at 08:42
Sorry, I was a bit unclear - it parse timezones correctly, but return default UTC datetime, not datetime with timezones. — jezrael, Jan 23 '18 at 08:45
@jezrael But how can it parse the timezone when I didn't give a directive for it? Also if I have `exact=True` I get `ValueError: unconverted data remains: UTC+02`, which leads me to think the timezone is fully ignored. — MasterScrat, Jan 23 '18 at 08:50
one small remark: when using 12H time notation, you shoudl us `%I` instead of `%H` — Maarten Fabré, Jan 23 '18 at 09:59

score 1 · Answer 1 · answered Jan 23 '18 at 12:32

I know you asked for a Pandas solution, but dateutil handles your strings correctly:

import dateutil
from dateutil.tz import gettz

samples = ['Thursday, September 22, 2016 at 11:04am UTC+02',
           'Monday, January 22, 2018 at 6:46pm CST']

# American time zone abbreviations
tzinfos = {'HAST': gettz('Pacific/Honolulu'),
           'AKST': gettz('America/Anchorage'),
           'PST': gettz('America/Los Angeles'),
           'MST': gettz('America/Phoenix'),
           'CST': gettz('America/Chicago'),
           'EST': gettz('America/New York'),
          }

for s in samples:
    parsed = dateutil.parser.parse(s, fuzzy=True, tzinfos=tzinfos)
    print(s, '->', parsed)

Output:

Thursday, September 22, 2016 at 11:04am UTC+02 -> 2016-09-22 11:04:00-02:00
Monday, January 22, 2018 at 6:46pm CST -> 2018-01-22 18:46:00-06:00

score 0 · Answer 2 · answered Jan 23 '18 at 08:57

0

# ValueError: unconverted data remains: +02 Is because you should parse the whole date string when using strptime, in which you are leaving the %z part. But you can't use %z in strptime, see ISO to datetime object: 'z' is a bad directive.

So maybe you could do some sort of mapping on your data:

timestamp = date.map(lambda x : dateutil.parser.parse(x))

answered Jan 23 '18 at 08:57

Hasan Jawad

197
9

1

Isn't this the same as not providing the `format` argument to `pd.to_datetime()` and letting Pandas figure out the format? The whole point of providing a format is to speed up the parsing. – Hristo Iliev Jan 23 '18 at 09:05
you might have to include some [`tz_infos`](https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.parse) – Maarten Fabré Jan 23 '18 at 10:08

Parsing dates with Pandas: how to take time zones into account?

2 Answers2