How to efficiently decode a large number of small JSON data chunks?

Question

I'm going to write a parser for a log file where each line is one JSON record.

I could decode each line in a loop:

logs = [json.loads(line) for line in lines]

or I could decode the whole file in one go:

logs = json.loads('[' + ','.join(lines) + ']')

I want to minimize the execution time, please disregard other factors. Is there any reason to prefer one approach over the other?

I think decoding the whole file in one go si faster because it avoid a potentially slow loop (but it is less pythonic), however you better should try yourself both solution and benchmark it. — Delgan, Jul 14 '16 at 07:55

score 3 · Accepted Answer · answered Jul 14 '16 at 07:57

You can easily test it with timeit:

$ python -m timeit -s 'import json; lines = ["{\"foo\":\"bar\"}"] * 1000' '[json.loads(line) for line in lines]'
100 loops, best of 3: 2.22 msec per loop
$ python -m timeit -s 'import json; lines = ["{\"foo\":\"bar\"}"] * 1000' "json.loads('[' + ','.join(lines) + ']')"
1000 loops, best of 3: 839 usec per loop

In this case combining the data and parsing it one go is about 2.5 times faster.

score 0 · Answer 2 · edited May 23 '17 at 12:14

0

You can just make the log as a JSON dictionary. Like

{    
"log":{
     "line1":{...}
     "line2":{...}
    ...
    }
}

And then do this that explains how to convert and use JSON into dictionary in Python

So you can use it directly without parsing

edited May 23 '17 at 12:14

Community

1
1

answered Jul 14 '16 at 07:55

Raskayu

735
7
20

How to efficiently decode a large number of small JSON data chunks?

2 Answers2