17

I have a string with variable length and I want to give a format to strptime in order for the rest of the string to be ignored. Let me exemplify. I have something like

9/4/2013,00:00:00,7.8,7.4,9.53
10/4/2013,00:00:00,8.64,7.4,9.53

and I want a format that makes the command strptime(line,format) work to read those lines. Something like format='%d/%m/%Y,%H:%M:%S*', although I know that doesn't work. I guess my question is kind of similar to this one, but no answer there could help me and my problem is a little worse because the full length of my string can vary. I have a feeling that dateutil could solve my problem, but I can't find something there that does the trick.

I can probably do something like strptime(''.join(line.split(',')[:2]),format), but I wouldn't want to resort to that for user-related issues.

smci
  • 32,567
  • 20
  • 113
  • 146
TomCho
  • 3,204
  • 6
  • 32
  • 83
  • 3
    This boils down to an enhance request on strptime to allow arbitrary regexes, at least in the trailing part of string: `format='%d/%m/%Y,%H:%M:%S.*'`. This is a common request and well worth considering. In fact [people have been asking for it for 13+ years](https://bugs.python.org/issue1006786). – smci Nov 16 '17 at 19:46

4 Answers4

24

You cannot have datetime.strptime() ignore part of the input.; your only option really is to split off the extra text first.

So yes, you do have to split and rejoin your string:

format = '%d/%m/%Y,%H:%M:%S'
datetime.strptime(','.join(line.split(',', 2)[:2]), format)

or find some other means to extract the information. You could use a regular expression, for example:

datetime_pattern = re.compile(r'(\d{1,2}/\d{1,2}/\d{4},\d{2}:\d{2}:\d{2})')
format = '%d/%m/%Y,%H:%M:%S'
datetime.strptime(datetime_pattern.search(line).group(), format)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    Yes, one way or another, you have to modify each input line, or modify the format for each input line. Too bad. – Edward Falk Mar 13 '18 at 23:52
  • This is pretty terrible (and is what I am doing) because you have to define two equivalent but differently specified date patterns. – Jeffrey Blattman May 26 '22 at 17:33
  • @JeffreyBlattman: Why do you need to define different date patterns? Extract just the date portion and pass that to `datetime.strptime()`. – Martijn Pieters May 27 '22 at 11:25
  • In your answer, see `datetime_pattern` and `format`. That's two different patterns. – Jeffrey Blattman May 27 '22 at 17:17
  • @JeffreyBlattman: The regex depends on the source line and is an example specific to this case. Given that the OP's data looks like CSV, splitting on commas looks more applicable here. – Martijn Pieters May 28 '22 at 17:46
  • Splitting by commas requires pattern matching. But yes, you answered it when you said it's not possible, using `strptime` only. You need to do your own parsing / pattern matching. – Jeffrey Blattman May 31 '22 at 22:26
2

To build a format string without splitting the time string and discarding extra text, just include the extra text in the format string. t[t.index(',',t.index(',') + 1):] is extra text.

from datetime import datetime
l = ['9/4/2013,00:00:00,7.8,7.4,9.53', '10/4/2013,00:00:00,8.64,7.4,9.53']
for t in l:
    print datetime.strptime(t,'%d/%m/%Y,%H:%M:%S'+t[t.index(',',t.index(',')+1):])

If the string has '%' can be replaced by empty string.

l = ['9/4/2013,00:00:00,7.8,7.4,9.53', '10/4/2013,00:00:00,8.64,7.4,9.53']
for t in l:
    t = t.replace('%','')
    fmt = '%d/%m/%Y,%H:%M:%S' + t[t.index(',',t.index(',')+1):]
    print datetime.strptime(t, fmt)

Or with string slicing and static format string,

for t in l:
        print datetime.strptime(t[:t.find(',',t.find(',')+1)],'%d/%m/%Y,%H:%M:%S')

2013-04-09 00:00:00
2013-04-10 00:00:00

Nizam Mohamed
  • 8,751
  • 24
  • 32
  • 1
    So what happens if the extra string contains `%` characters? Note that you are essentially doing the split in reverse; you are splitting of the *remainder* and adding it to the format string. – Martijn Pieters Mar 26 '15 at 21:22
  • chances of occurence of % in the date and time field hold true. I answered the OP. – Nizam Mohamed Mar 26 '15 at 21:26
  • 2
    Right, but so did I, yet it doesn't have the problems your "solution" has. Sometimes the answer really is *you cannot do that, but here is how you solve the problem*. – Martijn Pieters Mar 26 '15 at 21:26
  • what problems? The OP needed a format string. He already knew splitting original string. – Nizam Mohamed Mar 26 '15 at 21:30
  • @NizamMohamed: it will be less performant (and also more error-prone and brittle) than simply doing the split into . That might even be just simple subscripting, for fixed-format lines and when zero-padded %d is used. No I didn't downvote. Yes I believe people who downvote should explain their thinking, it usually generates constructive improvement, or at least highlights misunderstandings. – smci Nov 16 '17 at 19:39
2

Have a look at datetime-glob, a module we developed to parse date/times from a list of files. You can use datetime_glob.PatternSegment to parse arbitrary strings:

>>> import datetime_glob
>>> patseg = datetime_glob.parse_pattern_segment('%-d/%-m/%Y,%H:%M:%S*')
>>> match = datetime_glob.match_segment('9/4/2013,01:02:03,7.8,7.4,9.53',
                                        patseg)
>>> match.as_datetime()
datetime.datetime(2013, 4, 9, 1, 2, 3)
marko.ristin
  • 643
  • 8
  • 6
0

Using regexp too because python datetime does not allow to ignore char, this version use no-capturing group (sorry the example is not related to your question):

import datetime, re

date_re = re.compile(r'([^.]+)(?:\.[0-9]+) (\+[0-9]+)')
date_str = "2018-09-06 04:15:18.334232115 +0000"

date_str = " ".join(date_re.search(date_str).groups())

date_obj = datetime.datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S %z")

It's much better to use regexp like @marjin suggests, so your code is more comprehensible and easy to update.

Thomas Decaux
  • 21,738
  • 2
  • 113
  • 124