0

I want to compute the UNIX time out of a string in Python. My string contains both the UTC offset AND the timezone in brackets, as shown below.

The timezone (PDT) is troublesome as my code works until then. datestring1 is converted correctly, but datestring2 isn't.

import time
import datetime

datestring1 = "Mon, 14 May 2001 16:39:00 -0700"
datestring2 = "Mon, 14 May 2001 16:39:00 -0700 (PDT)"
time.mktime(datetime.datetime.strptime(datestring1, "%a, %d %b %Y %H:%M:%S %z").timetuple())
time.mktime(datetime.datetime.strptime(datestring2, "%a, %d %b %Y %H:%M:%S %z (%Z)").timetuple())
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • 1
    `PDT` is just an ambiguous abbreviation, not a timezone name. Check the [list of abbreviations](https://www.timeanddate.com/time/zones/). There are 3 BSTs (Bangladesh, Bougainville, British), 3 ISTs (Indian, Irish, Israel) etc. If you know the string contains an offset it's better to just strip the abbreviation before parsing – Panagiotis Kanavos Oct 22 '21 at 07:38
  • see also: my [answer on Python strptime() and timezones?](https://stackoverflow.com/a/69673614/10197418) – FObersteiner Oct 22 '21 at 08:22
  • note #2: `time.mktime` is not needed, you have a method [.timestamp()](https://docs.python.org/3/library/datetime.html#datetime.datetime.timestamp) for datetime objects (which is more readable as well I think). – FObersteiner Oct 22 '21 at 08:45

1 Answers1

1

You could use python-dateutil. Take a look at the answer here: Python strptime() and timezones?

It seems others have also had trouble parsing Timezone names using %Z.

In your case that would be:

import time
import datetime
from dateutil import parser

datestring1 = "Mon, 14 May 2001 16:39:00 -0700"
datestring2 = "Mon, 14 May 2001 16:39:00 -0700 (PDT)"

print(time.mktime(parser.parse(datestring1).timetuple()))
print(time.mktime(parser.parse(datestring2).timetuple()))
Kyle Dixon
  • 474
  • 2
  • 10
  • `others have also had trouble parsing Timezone names` that's because `PDT` isn't a timezone name, it's an ambiguous abbreviation. Ambiguous means more than one timezone may have the same abbreviation, eg IST is Indian, Irish, Israel, BST is Bangladesh, Bougainville, British. The de-facto standard names are the IANA [timezone database names](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) eg `America/Los_Angeles` – Panagiotis Kanavos Oct 22 '21 at 07:56
  • 2
    @PanagiotisKanavos [docs](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) aren't precise here as well I think, "*`%Z`: Time zone name [...] UTC, GMT*" (UTC isn't even a time zone...). The false assumption that `%Z` can parse arbitrary tz name abbreviations isn't that far-fetched. – FObersteiner Oct 22 '21 at 08:13
  • @MrFuppes that documentation was also what I looked at but was unsure of. After looking at #6 under the `Notes` [here](https://docs.python.org/3/library/datetime.html#technical-detail) I guess it depends on the user's location – Kyle Dixon Oct 22 '21 at 08:24
  • yes, even after the 2019 update, the docs are misleading. Python's `strptime` falls back to the platform's C library for this I think, so *on some platforms*, `%Z` will accept the local time's current abbreviation (e.g. "EST" if you're in New York during summer). Other than that, it is only good for *ignoring* "UTC" or "GMT", it doesn't actually parse those to a timezone object `datetime.timezone.utc`. At least on my platform (Win 10). – FObersteiner Oct 22 '21 at 08:30