0

So i'm making a request to an API that returns around 5000 results. Data structure looks like so:

[{'test': '1'}, {'test': '2'}, {'test': '3'}] (Only with 5000 results)

It's currently taking around 30 seconds to do this simple construct:

for x in ujson.loads(r.content):
    pass

As you can see I'm using ujson but it doesn't even really speed it compared to json.loads().

Any ideas on how to improve this performance?

Thanks

As requested, how im timing the code:

start = time.time()
r = requests.get(url, headers={'Range': 'items=1-5000'})
print 'time to make request: {0}'.format(time.time() - start)
for x in ujson.loads(r.content):
    pass
print 'time to parse request: {0}'.format(time.time() - start)
Strobe_
  • 495
  • 1
  • 13
  • 34
  • Are you sure the parsing is taking 30 seconds and not the request/response from the api? – user2263572 Apr 12 '17 at 13:22
  • Yes 100%. I'm timing it. It takes 60 seconds all together, 30 seconds for the request (which i cant really do much about) and 30 seconds for the parsing. – Strobe_ Apr 12 '17 at 13:23
  • Can you share you method of timing this? – Zac Apr 12 '17 at 13:24
  • @Zac I added it, had to redact some requests params but thats roughly it – Strobe_ Apr 12 '17 at 13:29
  • 1
    You should be adding a `start = time.time()` after the first print statement to reset the start time for the second measurement. – Zac Apr 12 '17 at 13:31
  • I can still see that it takes another 30 seconds after the first log. I changed it and same result: time to make request: 29.0964298248 time to parse request: 32.3355400562 – Strobe_ Apr 12 '17 at 13:33
  • 1
    Could you please update the example, and add in the output, that may help others help you. – Zac Apr 12 '17 at 14:00
  • @Strobe_ - that means that making the request takes ~29 seconds, parsing it takes ~3.24 seconds since your `start` is always the same. How large is the returned content? – mata Apr 12 '17 at 14:00
  • Guys, it is taking 30 seconds to parse the json. Please trust me, I added a new start variable after the request was finished. I even timed it on my phone and it was 60 seconds in total. – Strobe_ Apr 12 '17 at 14:10
  • @mata it's only 44981 bytes. – Strobe_ Apr 12 '17 at 14:10
  • That doesn't seem right, even with just `{"x": "y"}` repeated 5000 times you would be at around 60000 bytes, or are most of the objects in the array empty? If that's really correct, something very strange must be going on. – mata Apr 12 '17 at 14:34
  • Something is very wrong. 50k should be parsed much quicker. What kind of hardware are you using? Check this for example https://stackoverflow.com/questions/706101/python-json-decoding-performance the speed should be much higher. I would use cProfile to check what takes time on your system – Roman-Stop RU aggression in UA Mar 18 '18 at 12:16

3 Answers3

0

Maybe u can use r.json() instead of of r.content (http://docs.python-requests.org/en/master/)(https://github.com/kennethreitz/requests/blob/master/requests/models.py#L861) Don't know if this is faster.

Bart
  • 496
  • 10
  • 23
0

since I see that you use python 2, I would advice you cjson: you need to

pip install python-cjson 

then:

import cjson
start = time.time()
r = requests.get(url, headers={'Range': 'items=1-5000'})
print 'time to make request: {0}'.format(time.time() - start)
for x in cjson.decode(r.content):
    pass
print 'time to parse request: {0}'.format(time.time() - start)

Even with not so much heavy json it is faster than ujson ():

cjson - time to parse request: 0.000113010406494

ujson - time to parse request: 0.000193119049072

Olia
  • 815
  • 4
  • 16
-1

Try :

ujson_loads = ujson.loads(r.content)
for x in ujson_loads:
    pass

Haven't tested it yet, but it could be the solution to your problem.

Claudio
  • 7,474
  • 3
  • 18
  • 48