Finding date and timezone on string in Python

Question

I looking for datetime and timezone (fo example: Extracting date from a string in Python), but none of this answers can't resolve my problem.

I have string with date time and timezone in format: 25 Feb 2020 02:42:20 -0800 (PST) or 25 Feb 2020 11:42:20 +0100. I can't split it by space, because string have a lot of spaces, and datetime and time zone was on different part of string (some on the middle, some on the end of string).

I need find this datetime and time zone and convert it to MySQL format (to save to database).

Do you have or know any tips, tutorials or methods to resolve it? Thank you!

The two example strings are not only different by formatting but also content - the first has a specific timezone, the other only has a UTC offset. Do you have other formats as well? Also, could you give an example for the expected output? — FObersteiner, Jul 27 '20 at 10:00
Check out this [link](https://www.programiz.com/python-programming/datetime) on dates and timezones. — Eric, Jul 27 '20 at 10:01

score 1 · Answer 1 · answered Jul 27 '20 at 10:15

1

If you have it in this format (25 Feb 2020 11:42:20 +0100) you can convert it to a datetime object:

from datetime import datetime
dt = datetime.strptime("25 Feb 2020 11:42:20 +0100", "%d %b %Y %H:%M:%S %z")

Output

2020-02-25 11:42:20+01:00

answered Jul 27 '20 at 10:15

Lambo

1,094
11
18

Thank you for your answer, but date, time and time zone is somewhere on text. Sometimes in the middle, sometimes in the end of string. Sometimes I have timezone, sometimes i haven't and I can't predict what format the date will be in. – Mariusz Jul 28 '20 at 11:55

FObersteiner · Answer 2 · 2020-07-27T13:09:50.860

A flexible approach is to use dateutil's parser as in the linked question, together with a mapping dict that maps abbreviated time zones to valid (and processable) full time zone names.

from dateutil import parser

strings = ['25 Feb 2020 02:42:20 -0800 (PST)', '25 Feb 2020 11:42:20 +0100']

tzmapping = {'PST': 'US/Pacific'} # add a key-value pair for all your timezones...

for s in strings:
    print(repr(parser.parse(s, tzinfos=tzmapping)))

# datetime.datetime(2020, 2, 25, 2, 42, 20, tzinfo=tzstr('US/Pacific'))
# datetime.datetime(2020, 2, 25, 11, 42, 20, tzinfo=tzoffset(None, 3600))

If you know for sure that all of you date/time strings start with the same format and you only need the UTC offset to be parsed, a most likely faster option would be to truncate the string and parse with strptime (as suggested by @Lambo):

from datetime import datetime

for s in strings:
    print(repr(datetime.strptime(s[:26], "%d %b %Y %H:%M:%S %z")))
    
# datetime.datetime(2020, 2, 25, 2, 42, 20, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=57600)))
# datetime.datetime(2020, 2, 25, 11, 42, 20, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600)))

As for the output, I assume you need a ISO8601 compatible format. You can get that as

for s in strings:
    print((datetime.strptime(s[:26], "%d %b %Y %H:%M:%S %z")).isoformat(' '))
    
# 2020-02-25 02:42:20-08:00
# 2020-02-25 11:42:20+01:00

Thank you for your help! I got error: "raise ValueError" ("time data %r does not match format %r"). — Mariusz, Jul 28 '20 at 11:59
@Mariusz: `%r`? Sounds like a typo; which line exactly throws the error? — FObersteiner, Jul 28 '20 at 12:00
My strings is like: with SMTP (IdeaSmtpServer 0.83.292) id c45bcbeba3144cc1; Thu, 7 Nov 2019 03:53:24 +0100 or Received: by mail-qk1-f194.google.com with SMTP id b5so4361873qkh.8 for ; Tue, 25 Feb 2020 02:42:20 -0800 (PST) or by zimbra.inf.utfsm.cl (Postfix) with ESMTP id 5EC001857CF for ; Thu, 16 Jul 2020 03:45:33 -0400 (-04) — Mariusz, Jul 28 '20 at 12:01

Finding date and timezone on string in Python

2 Answers2