I have some text (str
, bytes
; actually gzipped in a file on disk) which can be parsed via ast.literal_eval
.
(It consists of a list of dicts, where the dict keys are strings, and values strings, int or float. But maybe this question could be generic for any string which can be parsed via ast.literal_eval
.)
It is large: ~22MB uncompressed.
What is the fastest way to parse it?
Surely I can use ast.literal_eval
, but this seems quite slow. Standard eval
is slightly faster (interestingly, but probably as expected, depending how well you know Python; see the implementation of ast.literal_eval
) but still slow.
In comparison, when I serialize the same data as JSON, and then load the JSON (json.loads
), this is way faster (>10x). So this shows that in principle it should be possible to parse it just as fast.
Some statistics:
Gunzip + read time: 0.15111494064331055
Size: 22035943
compile: 3.1023156170000004
parse: 3.3381092380000004
eval: 3.0252232049999996
ast.literal_eval: 3.765798232
json.loads: 0.2657175249999994
This benchmark script and also a script to generate such a dummy text file can be found: here
(Maybe the answer is: "this needs a faster C implementation; no-one has implemented that yet")
Ok, after posting this, I found some related questions. I did not found them via Google though (maybe my search term "faster literal_eval" was bad).
- Why is json.loads an order of magnitude faster than ast.literal_eval?
- python ast vs json for str to dict translation
This partly answers the question.