39

I'm trying to parse timestamp strings like "Sat, 11/01/09 8:00PM EST" in Python, but I'm having trouble finding a solution that will handle the abbreviated timezone.

I'm using dateutil's parse() function, but it doesn't parse the timezone. Is there an easy way to do this?

moinudin
  • 134,091
  • 45
  • 190
  • 216
gct
  • 14,100
  • 15
  • 68
  • 107

6 Answers6

64

dateutil's parser.parse() accepts as keyword argument tzinfos a dictionary of the kind {'EST': -5*3600} (that is, matching the zone name to GMT offset in seconds). So assuming we have that, we can do:

>>> import dateutil.parser as dp
>>> s = 'Sat, 11/01/09 8:00PM'
>>> for tz_code in ('PST','PDT','MST','MDT','CST','CDT','EST','EDT'):
>>>     dt = s+' '+tz_code
>>>     print dt, '=', dp.parse(dt, tzinfos=tzd)

Sat, 11/01/09 8:00PM PST = 2009-11-01 20:00:00-08:00
Sat, 11/01/09 8:00PM PDT = 2009-11-01 20:00:00-07:00
Sat, 11/01/09 8:00PM MST = 2009-11-01 20:00:00-07:00
Sat, 11/01/09 8:00PM MDT = 2009-11-01 20:00:00-06:00
Sat, 11/01/09 8:00PM CST = 2009-11-01 20:00:00-06:00
Sat, 11/01/09 8:00PM CDT = 2009-11-01 20:00:00-05:00
Sat, 11/01/09 8:00PM EST = 2009-11-01 20:00:00-05:00
Sat, 11/01/09 8:00PM EDT = 2009-11-01 20:00:00-04:00

Regarding the content of tzinfos, here is how i populated mine:

tz_str = '''-12 Y
-11 X NUT SST
-10 W CKT HAST HST TAHT TKT
-9 V AKST GAMT GIT HADT HNY
-8 U AKDT CIST HAY HNP PST PT
-7 T HAP HNR MST PDT
-6 S CST EAST GALT HAR HNC MDT
-5 R CDT COT EASST ECT EST ET HAC HNE PET
-4 Q AST BOT CLT COST EDT FKT GYT HAE HNA PYT
-3 P ADT ART BRT CLST FKST GFT HAA PMST PYST SRT UYT WGT
-2 O BRST FNT PMDT UYST WGST
-1 N AZOT CVT EGT
0 Z EGST GMT UTC WET WT
1 A CET DFT WAT WEDT WEST
2 B CAT CEDT CEST EET SAST WAST
3 C EAT EEDT EEST IDT MSK
4 D AMT AZT GET GST KUYT MSD MUT RET SAMT SCT
5 E AMST AQTT AZST HMT MAWT MVT PKT TFT TJT TMT UZT YEKT
6 F ALMT BIOT BTT IOT KGT NOVT OMST YEKST
7 G CXT DAVT HOVT ICT KRAT NOVST OMSST THA WIB
8 H ACT AWST BDT BNT CAST HKT IRKT KRAST MYT PHT SGT ULAT WITA WST
9 I AWDT IRKST JST KST PWT TLT WDT WIT YAKT
10 K AEST ChST PGT VLAT YAKST YAPT
11 L AEDT LHDT MAGT NCT PONT SBT VLAST VUT
12 M ANAST ANAT FJT GILT MAGST MHT NZST PETST PETT TVT WFT
13 FJST NZDT
11.5 NFT
10.5 ACDT LHST
9.5 ACST
6.5 CCT MMT
5.75 NPT
5.5 SLT
4.5 AFT IRDT
3.5 IRST
-2.5 HAT NDT
-3.5 HNT NST NT
-4.5 HLV VET
-9.5 MART MIT'''

tzd = {}
for tz_descr in map(str.split, tz_str.split('\n')):
    tz_offset = int(float(tz_descr[0]) * 3600)
    for tz_code in tz_descr[1:]:
        tzd[tz_code] = tz_offset

ps. per @Hank Gay time zone naming is not clearly defined. To form my table i used http://www.timeanddate.com/library/abbreviations/timezones/ and http://en.wikipedia.org/wiki/List_of_time_zone_abbreviations . I looked at each conflict and resolved conflicts between obscure and popular names towards the popular (more used ones). There was one - IST - that was not as clear cut (it can mean Indian Standard Time, Iran Standard Time, Irish Standard Time or Israel Standard Time), so i left it out of the table - you may need to chose what to add for it based on your location. Oh - and I left out the Republic of Kiribati with their absurd "look at me i am first to celebrate New Year" GMT+13 and GMT+14 time zones.

Nas Banov
  • 28,347
  • 6
  • 48
  • 67
  • 2
    I can't get [ChST](http://en.wikipedia.org/wiki/Chamorro_Time_Zone) to work. That lowercase **h** seems to cause problems. I had to use capitalize CHST in the list of timezones and do `dp.parse(dt, tzinfos=tzd)` – Adam V. Nov 19 '12 at 14:10
  • 1
    the dictionary is incorrect e.g., MSK in 2012/12 has 4 hour offset, but only 3 hours in previous years – jfs Dec 04 '12 at 18:05
  • 1
    note: `MSK` will be again 3 hours from UTC on 26 October 2014 i.e., given `'MSK'` you can't return the correct UTC offset if you don't know the date. [`'EST'` is worse, it may correspond to several UTC offsets at the same time](http://stackoverflow.com/a/13713813/4279) – jfs Sep 04 '14 at 11:10
  • This is what I was looking for. – akai Jun 13 '16 at 17:26
  • IST is more popularly used as India Standard Time (UTC +5:30) – KyferEz May 16 '19 at 13:55
13

That probably won't work because those abbreviations aren't unique. See this page for details. You might wind up just having to manually handle it yourself if you're working with a known set of inputs.

Hank Gay
  • 70,339
  • 36
  • 160
  • 222
  • Does it get easier if we limit it to timezones in the US? Are there a "standard" set of abbreviations in that event? – gct Nov 09 '09 at 20:40
  • Don't forget that "timezones in the US" includes AKST, AKDT, HAST, and HADT. If you just mean the continental 48 states, then you only have the 8 3-letter timezones to deal with (4 timezones, standard and daylight savings times). – PaulMcG Nov 09 '09 at 23:01
  • Apparently for good measure some places uses HST and HDT as equivalents for HAST and HADT too =\ – gct Nov 10 '09 at 01:08
  • 4
    By far the easiest route (though not often the most practical) is to adjust whatever program is providing the data so it sends it all in UTC, or failing that, using offsets from UTC, or failing that a full, valid timezone from the zoneinfo database. – Hank Gay Nov 10 '09 at 02:48
  • @HankGay That is of course not always under control of the developer. – gerrit Jul 09 '21 at 12:46
11

You might try pytz module: http://pytz.sourceforge.net/

pytz brings the Olson tz database into Python. This library allows accurate and cross platform timezone calculations using Python 2.3 or higher. It also solves the issue of ambiguous times at the end of daylight savings, which you can read more about in the Python Library Reference (datetime.tzinfo).

Amost all of the Olson timezones are supported.

Drake Guan
  • 14,514
  • 15
  • 67
  • 94
  • 3
    i am curious, how does one parse "Sat, 11/01/09 8:00PM EST" with said pytz? – Nas Banov Jan 25 '11 at 19:52
  • honestly, it's not solvable cause the abbreviation is not one-on-one. The good news is that pytz has already provide the mapping (one-to-many) and it's left for programmers to choose the desired mappings. – Drake Guan Jan 26 '11 at 13:56
  • 2
    @NasBanov: 'EST' is ambiguous, but [you could use pytz to enumerate all possible interpretations](http://stackoverflow.com/a/13713813/4279). – jfs Dec 07 '12 at 06:12
5

The parse() function in dateutil can't handle time zones. The thing I've been using is the %Z formatter and the time.strptime() function. I have no idea how it deals with the ambiguity in time zones, but it seems to tell the difference between CDT and CST, which is all I needed.

Background: I store backup images in directories whose names are timestamps using local time, since I don't have GMT clocks handy at home. So I use time.strptime(d, r"%Y-%m-%dT%H:%M:%S_%Z") to parse the directory names back into an actual time for age analysis.

Mike DeSimone
  • 41,631
  • 10
  • 72
  • 96
  • As I understand it, strptime deals with the ambiguity by only accepting times given in the current time zone setting. – Random832 Dec 02 '13 at 16:33
4

I realized that dateparser can solve this problem. https://pypi.org/project/dateparser/

Usage:

import dateparser


def time_gmt_format(str_datetime):
    # from string like "29/05/2020, 08:18 WIB" to GMT yyyymmddhhmmss

    date_time_obj = dateparser.parse(str_datetime, date_formats=['%d/%m/%Y, %H:%M %Z'], 
    settings={'TO_TIMEZONE': 'GMT'})  # convert to GMT datetime object

    return date_time_obj.strftime('%Y%m%d%H%M%S')  # Output: 20200529011800

Other timezones supported by this library: https://github.com/scrapinghub/dateparser/blob/e11a18a4d183a14211b28f5927ce01b220335881/dateparser/timezones.py

3

I used pytz to generate a TZINFOS mapping:

from datetime import datetime as dt

import pytz

from dateutil.tz import gettz
from pytz import utc
from dateutil import parser


def gen_tzinfos():
    for zone in pytz.common_timezones:
        try:
            tzdate = pytz.timezone(zone).localize(dt.utcnow(), is_dst=None)
        except pytz.NonExistentTimeError:
            pass
        else:
            tzinfo = gettz(zone)

            if tzinfo:
                yield tzdate.tzname(), tzinfo

TZINFOS Usage

>>> TZINFOS = dict(gen_tzinfos())
>>> TZINFOS
{'+02': tzfile('/usr/share/zoneinfo/Antarctica/Troll'),
 '+03': tzfile('/usr/share/zoneinfo/Europe/Volgograd'),
 '+04': tzfile('Europe/Ulyanovsk'),
 '+05': tzfile('/usr/share/zoneinfo/Indian/Kerguelen'),              
...
 'WGST': tzfile('/usr/share/zoneinfo/America/Godthab'),
 'WIB': tzfile('/usr/share/zoneinfo/Asia/Pontianak'),
 'WIT': tzfile('/usr/share/zoneinfo/Asia/Jayapura'),
 'WITA': tzfile('/usr/share/zoneinfo/Asia/Makassar'),
 'WSDT': tzfile('/usr/share/zoneinfo/Pacific/Apia'),
 'XJT': tzfile('/usr/share/zoneinfo/Asia/Urumqi')}

parser Usage

>>> date_str = 'Sat, 11/01/09 8:00PM EST'
>>> tzdate = parser.parse(date_str, tzinfos=TZINFOS)
>>> tzdate.astimezone(utc)
datetime.datetime(2009, 11, 2, 1, 0, tzinfo=<UTC>)

The UTC conversion is needed since there are many timezones available for each abbreviation. Since TZINFOS is a dict, it only has the last timezone per abbreviation. And you may not get the one you were expecting pre conversion.

>>> tzdate
datetime.datetime(2009, 11, 1, 20, 0, tzinfo=tzfile('/usr/share/zoneinfo/America/Port-au-Prince'))
reubano
  • 5,087
  • 1
  • 42
  • 41