186

With Python 3 I am requesting a json document from a URL.

response = urllib.request.urlopen(request)

The response object is a file-like object with read and readline methods. Normally a JSON object can be created with a file opened in text mode.

obj = json.load(fp)

What I would like to do is:

obj = json.load(response)

This however does not work as urlopen returns a file object in binary mode.

A work around is of course:

str_response = response.read().decode('utf-8')
obj = json.loads(str_response)

but this feels bad...

Is there a better way that I can transform a bytes file object to a string file object? Or am I missing any parameters for either urlopen or json.load to give an encoding?

N4v
  • 793
  • 8
  • 18
Peter Smit
  • 27,696
  • 33
  • 111
  • 170

12 Answers12

103

Python’s wonderful standard library to the rescue…

import codecs

reader = codecs.getreader("utf-8")
obj = json.load(reader(response))

Works with both py2 and py3.

Docs: Python 2, Python3

Czechnology
  • 14,832
  • 10
  • 62
  • 88
jbg
  • 4,903
  • 1
  • 27
  • 30
  • 11
    I got this error when trying this answer in `python 3.4.3` not sure why? The error was `TypeError: the JSON object must be str, not 'StreamReader'` – Aaron Lelevier Aug 05 '15 at 23:52
  • 9
    @AronYsidoro Did you possibly use `json.loads()` instead of `json.load()`? – SleepyCal Sep 28 '15 at 13:17
  • 6
    For bonus points, use the encoding specified in the response, instead of assuming utf-8: `response.headers.get_content_charset()`. Returns `None` if there is no encoding, and doesn't exist on python2. – Phil Frost Mar 21 '16 at 19:26
  • 5
    @PhilFrost That’s slick. In practice it might pay to be careful with that; JSON is always UTF-8, UTF-16 or UTF-32 by definition (and is overwhelmingly likely to be UTF-8), so if another encoding is returned by the web server, it’s possibly a misconfiguration of the web server software rather than genuinely non-standard JSON. – jbg Mar 22 '16 at 07:02
  • 1
    @jbg: json itself is a text format—it knows nothing about character encodings and bytes. Nothing stops you storing it on disk using any character encoding you like. Though RFCs for application/json media type say: [*"JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32."*](https://tools.ietf.org/html/rfc7159#section-8.1) i.e., a web server must use only these encodings. Also, there is [no `charset` parameter defined for `application/json`](http://stackoverflow.com/q/13096259/4279) and the recent rfc specify no way to detect the encoding. It makes utf-8 the only choice. – jfs May 03 '16 at 21:39
  • @PhilFrost it exists on Python 2 as `response.headers.getparam('charset')`, see [A good way to get the charset/encoding of an HTTP response in Python](http://stackoverflow.com/q/14592762/4279). Though as I said in the previous comment: It doesn't help with json. – jfs May 03 '16 at 21:40
  • 6
    when I used in in python 3.5, the error was "AttributeError: 'bytes' object has no attribute 'read'" – Harper Koo Oct 12 '16 at 06:29
  • @harperkoo: Did you possibly pass a `bytes` object as the `response` variable instead of a file-like object? If you already have a `bytes` object and just want to decode it, you can simply call the `decode(encoding)` method on it. – jbg Dec 14 '16 at 13:35
  • 1
    @jfs @jbg @phil-frost RFC8259 says, "Note: No "charset" parameter is defined for this registration. Adding one really has no effect on compliant recipients." Whether it is therefore better to trust, to ignore, or to trust-but-heuristically-evaluate-and-then-work-around a `charset` that a server nonetheless elected to send is likely a problem of the deepest sort of bikeshedding variety. – BMDan Jun 15 '19 at 19:41
  • @BMDan follow the link in my comment above that literally says: "no charset parameter defined..." – jfs Jun 15 '19 at 19:46
81

HTTP sends bytes. If the resource in question is text, the character encoding is normally specified, either by the Content-Type HTTP header or by another mechanism (an RFC, HTML meta http-equiv,...).

urllib should know how to encode the bytes to a string, but it's too naïve—it's a horribly underpowered and un-Pythonic library.

Dive Into Python 3 provides an overview about the situation.

Your "work-around" is fine—although it feels wrong, it's the correct way to do it.

SaidbakR
  • 13,303
  • 20
  • 101
  • 195
Humphrey Bogart
  • 7,423
  • 14
  • 52
  • 59
  • 6
    This may be the "correct" way to do it but if there was one thing I could undo about Python 3 it would be this bytes/strings crap. You would think the built-in library functions would at least know how to deal with other built-in library functions. Part of the reason we use python is the simple intuitive syntax. This change breaks that all over the place. – ThatAintWorking May 02 '14 at 23:22
  • 4
    Check out [the "requests" library](http://docs.python-requests.org/) -- it handles this sort of thing for you automagically. – offby1 Sep 02 '14 at 23:15
  • 2
    This isn’t a case of the built-in library functions needing to “know how” to deal with other functions. JSON is defined as a UTF-8 representation of objects, so it can’t magically decode bytes that it doesn’t know the encoding of. I do agree that `urlopen` ought to be able to decode the bytes itself since it knows the encoding. Anyway, I’ve posted the Python standard library solution as an answer — you can do streaming decoding of bytes using the `codecs` module. – jbg Sep 14 '14 at 01:41
  • 1
    @ThatAintWorking: I would disagree. While it is a pain in the neck to explicitly have to manage the difference between bytes and strings, it is a much greater pain to have the language make some implicit conversion for you. Implicit bytes <-> string conversions are a source of many bugs, and Python3 is very helpful in pointing out the pitfalls. But I agree the library has room for improvement in this area. – EvertW Jul 25 '17 at 14:24
  • @EvertW the failure, in my opinion, it forcing strings to be unicode in the first place. – ThatAintWorking Jul 25 '17 at 14:34
  • 1
    @ThatAintWorking: No, strings must be Unicode, if you want software that can be used in other places than the UK or USA. For decades we have suffered under the myoptic worldview of the ASCII committee. Python3 finally got it right. Might have something to do with Python originating in Europe... – EvertW Aug 01 '17 at 19:13
67

I have come to opinion that the question is the best answer :)

import json
from urllib.request import urlopen

response = urlopen("site.com/api/foo/bar").read().decode('utf8')
obj = json.loads(response)
SergO
  • 2,703
  • 1
  • 30
  • 23
20

For anyone else trying to solve this using the requests library:

import json
import requests

r = requests.get('http://localhost/index.json')
r.raise_for_status()
# works for Python2 and Python3
json.loads(r.content.decode('utf-8'))
Luke Yeager
  • 1,400
  • 1
  • 17
  • 30
  • 12
    This functionality is built-in to `requests`: you can simply do `r.json()` – jbg Dec 14 '16 at 13:36
  • 1
    The clarify, if you use @jbg's method, you don't need to do `json.loads`. All you have to do is `r.json()` and you've got your JSON object loaded into a dict already. – Blairg23 Jun 05 '17 at 03:20
  • `*** UnicodeEncodeError: 'ascii' codec can't encode characters in position 264-265: ordinal not in range(128)` – andilabs Mar 06 '18 at 13:04
14

This one works for me, I used 'request' library with json() check out the doc in requests for humans

import requests

url = 'here goes your url'

obj = requests.get(url).json() 
julian salas
  • 3,714
  • 1
  • 19
  • 20
Sarthak Gupta
  • 824
  • 12
  • 23
  • This is the best way. Really readable, and anyone who is doing something like this should have requests. – Baldrickk Sep 25 '19 at 15:13
7

I ran into similar problems using Python 3.4.3 & 3.5.2 and Django 1.11.3. However, when I upgraded to Python 3.6.1 the problems went away.

You can read more about it here: https://docs.python.org/3/whatsnew/3.6.html#json

If you're not tied to a specific version of Python, just consider upgrading to 3.6 or later.

PaulMest
  • 12,925
  • 7
  • 53
  • 50
4

As of Python 3.6, you can use json.loads() to deserialize a bytesobject directly (the encoding must be UTF-8, UTF-16 or UTF-32). So, using only modules from the standard library, you can do:

import json
from urllib import request

response = request.urlopen(url).read()
data = json.loads(response)
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
3

If you're experiencing this issue whilst using the flask microframework, then you can just do:

data = json.loads(response.get_data(as_text=True))

From the docs: "If as_text is set to True the return value will be a decoded unicode string"

cs_stackX
  • 1,407
  • 2
  • 19
  • 27
  • I got to this page because I was having an issue with Flask unit tests - thanks for posting the single line call. – sfblackl Apr 30 '17 at 18:57
2

This will stream the byte data into json.

import io

obj = json.load(io.TextIOWrapper(response))

io.TextIOWrapper is preferred to the codecs module reader. https://www.python.org/dev/peps/pep-0400/

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
Collin Anderson
  • 14,787
  • 6
  • 68
  • 57
1

Just found this simple method to return HttpResponse content as a json:

import json

request = RequestFactory() # ignore this, this just like your request object

response = MyView.as_view()(request) # got response as HttpResponse object

response.render() # call this so we could call response.content after

json_response = json.loads(response.content.decode('utf-8'))

print(json_response) # {"your_json_key": "your json value"}

Hope that helps you.

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
Aditya Kresna Permana
  • 11,869
  • 8
  • 42
  • 48
1

Your workaround actually just saved me. I was having a lot of problems processing the request using the Falcon framework. This worked for me. req being the request form curl pr httpie

json.loads(req.stream.read().decode('utf-8'))
thielyrics
  • 39
  • 4
-1

I used below program to use of json.loads()

import urllib.request
import json
endpoint = 'https://maps.googleapis.com/maps/api/directions/json?'
api_key = 'AIzaSyABbKiwfzv9vLBR_kCuhO7w13Kseu68lr0'
origin = input('where are you ?').replace(' ','+')
destination = input('where do u want to go').replace(' ','+')
nav_request = 'origin={}&destination={}&key={}'.format(origin,destination,api_key)
request = endpoint + nav_request
response = urllib.request.urlopen(request).read().decode('utf-8')
directions = json.loads(response)
print(directions)
Azat Ibrakov
  • 9,998
  • 9
  • 38
  • 50
jayesh
  • 3,277
  • 1
  • 18
  • 7