Fastest implementation of `ast.literal_eval`

Question

I have some text (str, bytes; actually gzipped in a file on disk) which can be parsed via ast.literal_eval.

(It consists of a list of dicts, where the dict keys are strings, and values strings, int or float. But maybe this question could be generic for any string which can be parsed via ast.literal_eval.)

It is large: ~22MB uncompressed.

What is the fastest way to parse it?

Surely I can use ast.literal_eval, but this seems quite slow. Standard eval is slightly faster (interestingly, but probably as expected, depending how well you know Python; see the implementation of ast.literal_eval) but still slow.

In comparison, when I serialize the same data as JSON, and then load the JSON (json.loads), this is way faster (>10x). So this shows that in principle it should be possible to parse it just as fast.

Some statistics:

Gunzip + read time: 0.15111494064331055
Size: 22035943
compile: 3.1023156170000004
parse: 3.3381092380000004
eval: 3.0252232049999996
ast.literal_eval: 3.765798232
json.loads: 0.2657175249999994

This benchmark script and also a script to generate such a dummy text file can be found: here

(Maybe the answer is: "this needs a faster C implementation; no-one has implemented that yet")

Ok, after posting this, I found some related questions. I did not found them via Google though (maybe my search term "faster literal_eval" was bad).

This partly answers the question.

@user2357112supportsMonica I'm just curious about this question. Surely I can save it as JSON, but I anyway want to know. Also, I would not have expected that there is such a big difference. — Albert, Mar 04 '21 at 17:36
@erip: Why? This is not subjective. This is very precise. You will find many similar questions here on StackOverflow. — Albert, Mar 04 '21 at 17:37
FYI, there are faster JSON implementations for Python out there https://github.com/ultrajson/ultrajson — Boris Verkhovskiy, Mar 04 '21 at 17:49
@ggorlen I'm surely aware of that (everyone is), and you can still measure it, and answer the question. This is a very common question in software development. — Albert, Mar 04 '21 at 18:03
@Albert fair enough--some think your question is focused enough and I cast a reopen vote. For me, it's an issue of specificity--I'm not arguing you can't measure anything, only that the bounds need to be reasonably clear. The question still seems to boil down to, if the data is valid JSON, what is the fastest JSON parser rather than `ast.literal_eval` which operates on a superset of JSON that doesn't appear to be needed. If your data is a subset of JSON, say, integer values only, maybe you can leverage that. — ggorlen, Mar 04 '21 at 18:06

score 1 · Accepted Answer · answered Mar 05 '21 at 16:17

So, to the best of my knowledge, there currently did not exist a faster implementation than ast.literal_eval (well, eval itself is a bit faster, but unsafe).

So I implemented my own simple implementation, which converts the literal Python code into equivalent binary Pickle data. So, for some bytes data, instead of ast.literal_eval(data.decode("utf8")), you would use pickle.loads(py_to_pickle(data)), and get a speedup by 5.5x.

The repo is here. This is a quite straight-forward implementation in C++, and you can easily directly use it with ctypes (there is an example in the repo).

New statistics:

Gunzip + read time: 0.1663219928741455
Size: 22540270
py_to_pickle: 0.539439306
pickle.loads+py_to_pickle: 0.7234611099999999
compile: 3.3440755870000003
parse: 3.6302585899999995
eval: 3.306765757000001
ast.literal_eval: 4.056752016000003
json.loads: 0.3230752619999997
pickle.loads: 0.1351051709999993
marshal.loads: 0.10351717500000035

Fastest implementation of `ast.literal_eval`

1 Answers1