19

I have code which reads vast numbers of dates in 'YYYY-MM-DD' format. Parsing all these dates, so that it can add one, two, or three days then write back in the same format is slowing things down quite considerably.

 3214657   14.330    0.000  103.698    0.000 trade.py:56(effective)
 3218418   34.757    0.000   66.155    0.000 _strptime.py:295(_strptime)

 day = datetime.datetime.strptime(endofdaydate, "%Y-%m-%d").date()

Any suggestions how to speed it up a bit (or a lot)?

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
John Mee
  • 50,179
  • 34
  • 152
  • 186

3 Answers3

40

Is factor 7 lot enough?

datetime.datetime.strptime(a, '%Y-%m-%d').date()       # 8.87us

datetime.date(*map(int, a.split('-')))                 # 1.28us

EDIT: great idea with explicit slicing:

datetime.date(int(a[:4]), int(a[5:7]), int(a[8:10]))   # 1.06us

that makes factor 8.

eumiro
  • 207,213
  • 34
  • 299
  • 261
  • 6
    In context: strptime = 128s, this = 61s, and for 55s be explicit: `datetime.date(int(a[:4]), int(a[5:7]), int(a[8:10]))`. Now to replace the strftime and potentially prune another 10s... thx. – John Mee Nov 20 '12 at 07:26
14

Python 3.7+: fromisoformat()

Since Python 3.7, the datetime class has a method fromisoformat. It should be noted that this can also be applied to this question:

Performance vs. strptime()

Explicit string slicing may give you about a 9x increase in performance compared to normal strptime, but you can get about a 90x increase with the built-in fromisoformat method!

%timeit isofmt(datelist)
569 µs ± 8.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit slice2int(datelist)
5.51 ms ± 48.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit normalstrptime(datelist)
52.1 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
from datetime import datetime, timedelta
base, n = datetime(2000, 1, 1, 1, 2, 3, 420001), 10000
datelist = [(base + timedelta(days=i)).strftime('%Y-%m-%d') for i in range(n)]

def isofmt(l):
    return list(map(datetime.fromisoformat, l))
    
def slice2int(l):   
    def slicer(t):
        return datetime(int(t[:4]), int(t[5:7]), int(t[8:10]))
    return list(map(slicer, l))

def normalstrptime(l):
    return [datetime.strptime(t, '%Y-%m-%d') for t in l]
    
print(isofmt(datelist[0:1]))
print(slice2int(datelist[0:1]))
print(normalstrptime(datelist[0:1]))

# [datetime.datetime(2000, 1, 1, 0, 0)]
# [datetime.datetime(2000, 1, 1, 0, 0)]
# [datetime.datetime(2000, 1, 1, 0, 0)]

Python 3.8.3rc1 x64 / Win10

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • But fromisoformat is "this does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of datetime.isoformat()" from https://docs.python.org/3/library/datetime.html#datetime.datetime.fromisoformat So dates like "2020-08-24T00:00:00.00+00:00" do not work – visch Dec 10 '21 at 01:59
  • @visch well, not providing a *proper* ISO format parser (and formatter) in the standard library of a full-featured language like Python is pretty poor if you ask me (we have 3.10 now!). **But** why should this prevent you from using the features that do exist to your full advantage? – FObersteiner Dec 10 '21 at 05:52
  • I'd love to use the function, but I hit at least one case (the one in the last comment) that doesn't work (even though I'm pretty sure it's a valid iso 8601 datetime). I ended up going with https://github.com/closeio/ciso8601 – visch Dec 10 '21 at 18:31
  • 1
    @visch Python's ISO format parser has been extended in Python 3.11, for example "2020-08-24T00:00:00.00+00:00" is now parsed correctly. – FObersteiner Jun 23 '23 at 06:43
0

For an ISO-formatted timezone-free string, eg.: "2021-01-04T14:30:03.123":

datetime.datetime(int(d[:4]), int(d[5:7]), int(d[8:10]), int(d[11:13]), int(d[14:16]), int(d[17:19]), int(d[20:]))

Seems to run faster than strptime().

Voy
  • 5,286
  • 1
  • 49
  • 59
  • *Seems* to run faster? How did you benchmark? – FObersteiner Jun 23 '23 at 07:42
  • 1
    Honestly, I remember reading about it somewhere at the time and the results that I got when testing it out suggested that indeed this ran even faster. I now tried finding that resource again, and I couldn't, my bad. I also went back to my old code - parsing some csv date strings - and tried some tests to verify this claim, but unfortunately couldn't confirm this. So I've edited this post to remove the statement about running faster than `fromisoformat()`. Thanks for pointing it out – Voy Jun 23 '23 at 17:00