Fastest and better way to get json data using Python

Question

I'm using a json service and I'm getting the data on this way:

import urllib2
import json
url = "http://nominatim.openstreetmap.org/reverse?format=json&lat=52.5487429714954&lon=-1.81602098644987&zoom=18&addressdetails=1"
r = urllib2.urlopen(url)
data = json.load(r)

I need run 10-50 queries by second aprox. what can be the best way to do it?

PD: No json service limits.
Thanks

The code you show is entirely bound by network performance. There is nothing you can do on the Python side apart from executing network communication in parallel. — Martijn Pieters, Nov 11 '13 at 18:40
What "isn't good" about it? You aren't getting enough request volume? Like @MartijnPieters said, your current code is bound by networking performance. To fix that, you should look at using threads or worker processes to generate the requests in parallel. — Silas Ray, Nov 11 '13 at 18:42

Pedro Werneck · Answer 1 · 2013-11-11T19:51:12.720

2

There isn't much you can do on the python side.

If this issue isn't a problem for you, you might consider using the latest simplejson, which is significantly faster for loading than the standard library json. Keep in mind that while the deserialization is faster when comparing the libraries directly, the difference might not be worth it when you consider your whole request/response cycle.

For running parallel requests, you should try grequests:

urls = ["http://nominatim.openstreetmap.org/reverse?format=json&lat=52.5487429714954&lon=-1.81602098644987&zoom=18&addressdetails=1",
        ....
       ]

requests = (grequests.get(u) for u in urls)

responses = grequests.map(requests)

for r in responses:
    print r.json()

Obviously, even if you start 50 requests in parallel, you're bound by your network and the remote server performance.

edited Nov 11 '13 at 19:51

answered Nov 11 '13 at 18:42

Pedro Werneck

40,902
7
64
85

2

The stdlib `json` module **is the same module** as `simplejson`. `simplejson` is simply the externally maintained version for older Python versions. – Martijn Pieters Nov 11 '13 at 18:43
Indeed, but simplejson is not stuck on the version released with the Python release. – Pedro Werneck Nov 11 '13 at 18:46
And where do the magical speedups in `simplejson` come from then? The architecture is the same, Python stdlib has the same speedups C extension, I see only bugfixes in the changelog. – Martijn Pieters Nov 11 '13 at 18:50
No idea where they come from, but they exist. This answer points how `simplejson` is significantly faster at loading, while `json` is faster at dumping. http://stackoverflow.com/a/16131316/1202421 – Pedro Werneck Nov 11 '13 at 18:53
Those results are not wide enough apart to be called 'significant'. I'd call a magnitude difference significant. I wasn't aware of the cavalier attitude towards returning `str` for some inputs, `unicode` for others, that's truly horrifying. – Martijn Pieters Nov 11 '13 at 19:06
Significant, according to the dictionary, means "large enough to be noticed or have an effect". Considering only the deserialization operations, the improvement is clearly noticeable, almost a 100% in some cases, so I guess it's fine to say they are significant. If you're considering the whole request/response cycle, then maybe you have an argument. – Pedro Werneck Nov 11 '13 at 19:17
I'll have to retract the 'is the same module' argument, but I find [issue 40](https://code.google.com/p/simplejson/issues/detail?id=40) shocking enough to recommend against using `simplejson` altogether. When using the library, you now have to explicitly test for `str` vs. `unicode` responses and decode explicitly, and that adds to the performance burden. – Martijn Pieters Nov 11 '13 at 19:19
It gives `str` if the string is ASCII only, and that can be used with unicode. Other than the assymetry of not getting the exact same input/output on reverse operations, which might be confusing, there's hardly any usability burden in that. – Pedro Werneck Nov 11 '13 at 19:37
It happens to work on Python 2, but on Python 3 implicit conversions have been ejected (rightfully) and moreover `str` objects have no `decode()` method, and `bytes` have no `encode()`. Returning two different types from the same API is a *terrible* idea. – Martijn Pieters Nov 11 '13 at 19:39
That explains why I never had that problem with `simplejson` then. Point made. I updated my answer to reflect that and give fair warning. – Pedro Werneck Nov 11 '13 at 19:47

Fastest and better way to get json data using Python

1 Answers1