Speeding up json parsing

Question

So i'm making a request to an API that returns around 5000 results. Data structure looks like so:

[{'test': '1'}, {'test': '2'}, {'test': '3'}] (Only with 5000 results)

It's currently taking around 30 seconds to do this simple construct:

for x in ujson.loads(r.content):
    pass

As you can see I'm using ujson but it doesn't even really speed it compared to json.loads().

Any ideas on how to improve this performance?

Thanks

As requested, how im timing the code:

start = time.time()
r = requests.get(url, headers={'Range': 'items=1-5000'})
print 'time to make request: {0}'.format(time.time() - start)
for x in ujson.loads(r.content):
    pass
print 'time to parse request: {0}'.format(time.time() - start)

Are you sure the parsing is taking 30 seconds and not the request/response from the api? — user2263572, Apr 12 '17 at 13:22
Yes 100%. I'm timing it. It takes 60 seconds all together, 30 seconds for the request (which i cant really do much about) and 30 seconds for the parsing. — Strobe_, Apr 12 '17 at 13:23
@Zac I added it, had to redact some requests params but thats roughly it — Strobe_, Apr 12 '17 at 13:29
You should be adding a `start = time.time()` after the first print statement to reset the start time for the second measurement. — Zac, Apr 12 '17 at 13:31
I can still see that it takes another 30 seconds after the first log. I changed it and same result: time to make request: 29.0964298248 time to parse request: 32.3355400562 — Strobe_, Apr 12 '17 at 13:33
Could you please update the example, and add in the output, that may help others help you. — Zac, Apr 12 '17 at 14:00
@Strobe_ - that means that making the request takes ~29 seconds, parsing it takes ~3.24 seconds since your `start` is always the same. How large is the returned content? — mata, Apr 12 '17 at 14:00
Guys, it is taking 30 seconds to parse the json. Please trust me, I added a new start variable after the request was finished. I even timed it on my phone and it was 60 seconds in total. — Strobe_, Apr 12 '17 at 14:10
That doesn't seem right, even with just `{"x": "y"}` repeated 5000 times you would be at around 60000 bytes, or are most of the objects in the array empty? If that's really correct, something very strange must be going on. — mata, Apr 12 '17 at 14:34
Something is very wrong. 50k should be parsed much quicker. What kind of hardware are you using? Check this for example https://stackoverflow.com/questions/706101/python-json-decoding-performance the speed should be much higher. I would use cProfile to check what takes time on your system — Roman-Stop RU aggression in UA, Mar 18 '18 at 12:16

Bart · Answer 1 · 2017-04-12T13:51:08.893

0

Maybe u can use r.json() instead of of r.content (http://docs.python-requests.org/en/master/)(https://github.com/kennethreitz/requests/blob/master/requests/models.py#L861) Don't know if this is faster.

edited Apr 12 '17 at 13:51

answered Apr 12 '17 at 13:47

Bart

496
10
23

This is a bit slower, unfortunately. – Strobe_ Apr 12 '17 at 13:50

score 0 · Answer 2 · answered Apr 12 '17 at 14:16

since I see that you use python 2, I would advice you cjson: you need to

pip install python-cjson

then:

import cjson
start = time.time()
r = requests.get(url, headers={'Range': 'items=1-5000'})
print 'time to make request: {0}'.format(time.time() - start)
for x in cjson.decode(r.content):
    pass
print 'time to parse request: {0}'.format(time.time() - start)

Even with not so much heavy json it is faster than ujson ():

cjson - time to parse request: 0.000113010406494

ujson - time to parse request: 0.000193119049072

score -1 · Answer 3 · answered Apr 12 '17 at 14:13

-1

Try :

ujson_loads = ujson.loads(r.content)
for x in ujson_loads:
    pass

Haven't tested it yet, but it could be the solution to your problem.

answered Apr 12 '17 at 14:13

Claudio

7,474
3
18
48

Speeding up json parsing

3 Answers3