1

I have dates in these formats:

Thursday, September 22, 2016 at 11:04am UTC+02
Monday, January 22, 2018 at 6:46pm CST
...

I want to convert them to UNIX timestamps. This pattern works, but it ignores the timezone:

timestamp = pd.to_datetime(date, format='%A, %B %d, %Y at %H:%M%p', exact=False)

I don't see how to take the timezones ("UTC+02, "CST") into account.

This doesn't work:

timestamp = pd.to_datetime(date, format='%A, %B %d, %Y at %H:%M%p %Z')
# ValueError: unconverted data remains: +02
MasterScrat
  • 7,090
  • 14
  • 48
  • 80
  • Hmmm, I think it not ignore timezones, only parse to default UTC datetimes. – jezrael Jan 23 '18 at 08:32
  • So need convert them manually to timezones if necessary - e.g. by [this solution](https://stackoverflow.com/q/42826388/2901002) – jezrael Jan 23 '18 at 08:37
  • @jezrael then I don't understand how it knows it should remove eg 2 hours to get UTC time if it doesn't parse the timezone. Does it use the default python timezone? – MasterScrat Jan 23 '18 at 08:42
  • Sorry, I was a bit unclear - it parse timezones correctly, but return default UTC datetime, not datetime with timezones. – jezrael Jan 23 '18 at 08:45
  • @jezrael But how can it parse the timezone when I didn't give a directive for it? Also if I have `exact=True` I get `ValueError: unconverted data remains: UTC+02`, which leads me to think the timezone is fully ignored. – MasterScrat Jan 23 '18 at 08:50
  • It is hard question, unfortunately I have no idea. – jezrael Jan 23 '18 at 09:30
  • one small remark: when using 12H time notation, you shoudl us `%I` instead of `%H` – Maarten Fabré Jan 23 '18 at 09:59

2 Answers2

1

I know you asked for a Pandas solution, but dateutil handles your strings correctly:

import dateutil
from dateutil.tz import gettz

samples = ['Thursday, September 22, 2016 at 11:04am UTC+02',
           'Monday, January 22, 2018 at 6:46pm CST']

# American time zone abbreviations
tzinfos = {'HAST': gettz('Pacific/Honolulu'),
           'AKST': gettz('America/Anchorage'),
           'PST': gettz('America/Los Angeles'),
           'MST': gettz('America/Phoenix'),
           'CST': gettz('America/Chicago'),
           'EST': gettz('America/New York'),
          }

for s in samples:
    parsed = dateutil.parser.parse(s, fuzzy=True, tzinfos=tzinfos)
    print(s, '->', parsed)

Output:

Thursday, September 22, 2016 at 11:04am UTC+02 -> 2016-09-22 11:04:00-02:00
Monday, January 22, 2018 at 6:46pm CST -> 2018-01-22 18:46:00-06:00
chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66
0

# ValueError: unconverted data remains: +02 Is because you should parse the whole date string when using strptime, in which you are leaving the %z part. But you can't use %z in strptime, see ISO to datetime object: 'z' is a bad directive.

So maybe you could do some sort of mapping on your data:

timestamp = date.map(lambda x : dateutil.parser.parse(x))
Hasan Jawad
  • 197
  • 9
  • 1
    Isn't this the same as not providing the `format` argument to `pd.to_datetime()` and letting Pandas figure out the format? The whole point of providing a format is to speed up the parsing. – Hristo Iliev Jan 23 '18 at 09:05
  • you might have to include some [`tz_infos`](https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.parse) – Maarten Fabré Jan 23 '18 at 10:08