1

within a dataset, there are several different datetime-strings.

For example:

2020-11-16T06:00:00Z

2020-11-16T06:00:00+01:00

2020-11-16T06:00:00+01:00Z

2020-11-16T06:00:00+02:00

2020-11-16T06:00:00.000Z

I thought about replacing everything after the seconds, but it gives me errors, when for example +01:00 isn't given in the first place.. or else.

Do you have any clou, how to handle this?

It would be absolutely enough, if I could get:

%Y-%m-%dT%H:%M

(The basics, how to strp and strf are known...)

I've wrangled my head all night about this problem. Hope, that one of you have got a solution...

thank you in advance!

Rhinozeros
  • 69
  • 6
  • These are all basically IS8601 formatted datetime strings, see [thread](https://stackoverflow.com/questions/127803/how-do-i-parse-an-iso-8601-formatted-date) – metatoaster Nov 16 '20 at 05:07
  • 1
    Note that `2020-11-16T06:00:00+01:00Z` is not actually correct, the `Z` indicates Zulu time, but the `+01:00` contradicts this – Grismar Nov 16 '20 at 05:09
  • Thank you, I just wanted to state different time formats and did not take care of correct formats. These were meant just to be examples. I've found a simple solution for my problem and stated it at the end of this topic. – Rhinozeros Nov 16 '20 at 21:31

3 Answers3

3

Python has a standard library that deals with this problem:

import dateutil.parser

examples = [
    '2020-11-16T06:00:00Z',
    '2020-11-16T06:00:00+01:00',
    '2020-11-16T06:00:00+01:00Z',
    '2020-11-16T06:00:00+02:00',
    '2020-11-16T06:00:00.000Z'
]

for e in examples:
    try:
        print(dateutil.parser.parse(e))
    except ValueError:
        print(f'Invalid datetime: {e}')

Result:

2020-11-16 06:00:00+00:00
2020-11-16 06:00:00+01:00
Invalid datetime: 2020-11-16T06:00:00+01:00Z
2020-11-16 06:00:00+02:00
2020-11-16 06:00:00+00:00

@Z4-tier also has a solution for your examples(be careful with just leaving off the end of the string though), but dateutil will also deal with more exotic stuff:

print(dateutil.parser.parse('15:45 16 Nov 2020'))

Result:

2020-11-16 15:45:00

Also note this:

        print(dateutil.parser.parse(e).tzinfo)

If you add that, you'll see that dateutil includes the information about the time zone in the result, which would be lost if you only parse the first part of the strings.

Grismar
  • 27,561
  • 4
  • 31
  • 54
  • 1
    ahh, `dateutil`. good idea. But why does it choke on `2020-11-16T06:00:00+01:00Z`? I think that's a valid ISO date. – Z4-tier Nov 16 '20 at 05:09
  • It's not - the Z indicates Zulu time (UTC+0), but the +01:00 indicates it's actually an hour away from Zulu time. – Grismar Nov 16 '20 at 05:12
1

how about this:

import datetime

dates = ['2020-11-16T06:00:00Z',
         '2020-11-16T06:00:00+01:00',
         '2020-11-16T06:00:00+01:00Z',
         '2020-11-16T06:00:00+02:00',
         '2020-11-16T06:00:00.000Z']

for d in dates:
    datetime.datetime.fromisoformat(d[0:19])

Since each date has the same format up to the offset and timezone, just strip that part off of the string and cast it to a datetime.datetime.

Z4-tier
  • 7,287
  • 3
  • 26
  • 42
  • Note that your code simply ignores the timezone indication, since you only parse `[0:19]` for each date string. – Grismar Nov 16 '20 at 05:13
  • @Grismar yep, I know. Since the post asked for `%Y-%m-%dT%H:%M`, I am assuming this is acceptable (and if it is, it's probably the fastest way to get there). – Z4-tier Nov 16 '20 at 05:15
  • Given the example data includes time stamps both in zones `+01:00` and `+02:00`, I'm betting OP will run into a need to know the difference. After all, those are an hour apart. – Grismar Nov 16 '20 at 05:24
0

The dataset is generated by a scraper, which scrapes news pages.

Therefore the datetime is scraped as string in the first place, so that the conversion has to take place for several different occurrences, before a strptime can be executed.

I found a solution for my problem, which was influenced by all of the approaches of you guys. ''' date = '2020-11-16T06:00:00+01:00' splitat = 19 date = date[:splitat] date '''

This resulted in the standardized format, which I needed:

'2020-11-16T06:00:00'
ZF007
  • 3,708
  • 8
  • 29
  • 48
Rhinozeros
  • 69
  • 6
  • This is what I suggested in my answer (`d[0:19]` is the same as `date[:splitat]`). StackOverflow is not a typical internet forum, so no need to post replies just to say "thanks". Instead you should pick an answer that best helped you in finding a solution and click the check mark next to it to accept it as the answer. I am glad you found a solution that worked for you! – Z4-tier Nov 16 '20 at 22:01
  • Thank you (nevertheless, in my opinion polite manners are getting rare everywhere these days ;-) ). Do you mind to give my a hint, where to look your example up? From my limited point of knowledge, I was not able to realize, that (d[0:19]) is analog in his behavior to date[:splitat]. I'd like to read this up and learn. – Rhinozeros Nov 17 '20 at 01:38
  • You could look for tutorials that cover topics on lists and strings. This is really a question of string manipulation, but in Python strings share a lot of similarities with lists in terms of the way you can use `[x:y]` square brackets to index into them. In this case, `d[0:19]` works out to be the same as `date[:splitat]` because you can leave out the starting index and it will default to zero. – Z4-tier Nov 17 '20 at 01:57
  • 1
    A lot of internet tutorials are not very well written though, so if you really want a cleanly-written and well-edited place to learn from, I'd go for one of the introductory books from o'reilly. They have several Python books, but 'Think Python' might be a good place to start. – Z4-tier Nov 17 '20 at 02:04