0

I'm going to write a parser for a log file where each line is one JSON record.

I could decode each line in a loop:

logs = [json.loads(line) for line in lines]

or I could decode the whole file in one go:

logs = json.loads('[' + ','.join(lines) + ']')

I want to minimize the execution time, please disregard other factors. Is there any reason to prefer one approach over the other?

VPfB
  • 14,927
  • 6
  • 41
  • 75
  • I think decoding the whole file in one go si faster because it avoid a potentially slow loop (but it is less pythonic), however you better should try yourself both solution and benchmark it. – Delgan Jul 14 '16 at 07:55

2 Answers2

3

You can easily test it with timeit:

$ python -m timeit -s 'import json; lines = ["{\"foo\":\"bar\"}"] * 1000' '[json.loads(line) for line in lines]'
100 loops, best of 3: 2.22 msec per loop
$ python -m timeit -s 'import json; lines = ["{\"foo\":\"bar\"}"] * 1000' "json.loads('[' + ','.join(lines) + ']')"
1000 loops, best of 3: 839 usec per loop

In this case combining the data and parsing it one go is about 2.5 times faster.

niemmi
  • 17,113
  • 7
  • 35
  • 42
0

You can just make the log as a JSON dictionary. Like

{    
"log":{
     "line1":{...}
     "line2":{...}
    ...
    }
}

And then do this that explains how to convert and use JSON into dictionary in Python

So you can use it directly without parsing

Community
  • 1
  • 1
Raskayu
  • 735
  • 7
  • 20