Let JSON object accept bytes or let urlopen output strings

Question

With Python 3 I am requesting a json document from a URL.

response = urllib.request.urlopen(request)

The response object is a file-like object with read and readline methods. Normally a JSON object can be created with a file opened in text mode.

obj = json.load(fp)

What I would like to do is:

obj = json.load(response)

This however does not work as urlopen returns a file object in binary mode.

A work around is of course:

str_response = response.read().decode('utf-8')
obj = json.loads(str_response)

but this feels bad...

Is there a better way that I can transform a bytes file object to a string file object? Or am I missing any parameters for either urlopen or json.load to give an encoding?

I think you have a typo there, "readall" should be "read" ? – Bob Yoplait May 17 '17 at 14:42 — Bob Yoplait, May 17 '17 at 14:42

score 103 · Answer 1 · edited May 23 '18 at 10:29

103

Python’s wonderful standard library to the rescue…

import codecs

reader = codecs.getreader("utf-8")
obj = json.load(reader(response))

Works with both py2 and py3.

Docs: Python 2, Python3

edited May 23 '18 at 10:29

Czechnology

14,832
10
62
88

answered Sep 14 '14 at 01:39

jbg

4,903
1
27
30

11

I got this error when trying this answer in `python 3.4.3` not sure why? The error was `TypeError: the JSON object must be str, not 'StreamReader'` – Aaron Lelevier Aug 05 '15 at 23:52
9

@AronYsidoro Did you possibly use `json.loads()` instead of `json.load()`? – SleepyCal Sep 28 '15 at 13:17
6

For bonus points, use the encoding specified in the response, instead of assuming utf-8: `response.headers.get_content_charset()`. Returns `None` if there is no encoding, and doesn't exist on python2. – Phil Frost Mar 21 '16 at 19:26
5

@PhilFrost That’s slick. In practice it might pay to be careful with that; JSON is always UTF-8, UTF-16 or UTF-32 by definition (and is overwhelmingly likely to be UTF-8), so if another encoding is returned by the web server, it’s possibly a misconfiguration of the web server software rather than genuinely non-standard JSON. – jbg Mar 22 '16 at 07:02
1

@jbg: json itself is a text format—it knows nothing about character encodings and bytes. Nothing stops you storing it on disk using any character encoding you like. Though RFCs for application/json media type say: [*"JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32."*](https://tools.ietf.org/html/rfc7159#section-8.1) i.e., a web server must use only these encodings. Also, there is [no `charset` parameter defined for `application/json`](http://stackoverflow.com/q/13096259/4279) and the recent rfc specify no way to detect the encoding. It makes utf-8 the only choice. – jfs May 03 '16 at 21:39
@PhilFrost it exists on Python 2 as `response.headers.getparam('charset')`, see [A good way to get the charset/encoding of an HTTP response in Python](http://stackoverflow.com/q/14592762/4279). Though as I said in the previous comment: It doesn't help with json. – jfs May 03 '16 at 21:40
6

when I used in in python 3.5, the error was "AttributeError: 'bytes' object has no attribute 'read'" – Harper Koo Oct 12 '16 at 06:29
@harperkoo: Did you possibly pass a `bytes` object as the `response` variable instead of a file-like object? If you already have a `bytes` object and just want to decode it, you can simply call the `decode(encoding)` method on it. – jbg Dec 14 '16 at 13:35
1

@jfs @jbg @phil-frost RFC8259 says, "Note: No "charset" parameter is defined for this registration. Adding one really has no effect on compliant recipients." Whether it is therefore better to trust, to ignore, or to trust-but-heuristically-evaluate-and-then-work-around a `charset` that a server nonetheless elected to send is likely a problem of the deepest sort of bikeshedding variety. – BMDan Jun 15 '19 at 19:41
@BMDan follow the link in my comment above that literally says: "no charset parameter defined..." – jfs Jun 15 '19 at 19:46

score 81 · Accepted Answer · edited May 03 '16 at 18:50

81

HTTP sends bytes. If the resource in question is text, the character encoding is normally specified, either by the Content-Type HTTP header or by another mechanism (an RFC, HTML meta http-equiv,...).

urllib should know how to encode the bytes to a string, but it's too naïve—it's a horribly underpowered and un-Pythonic library.

Dive Into Python 3 provides an overview about the situation.

Your "work-around" is fine—although it feels wrong, it's the correct way to do it.

edited May 03 '16 at 18:50

SaidbakR

13,303
20
101
195

answered Jul 28 '11 at 17:13

Humphrey Bogart

7,423
14
52
59

6

This may be the "correct" way to do it but if there was one thing I could undo about Python 3 it would be this bytes/strings crap. You would think the built-in library functions would at least know how to deal with other built-in library functions. Part of the reason we use python is the simple intuitive syntax. This change breaks that all over the place. – ThatAintWorking May 02 '14 at 23:22
4

Check out [the "requests" library](http://docs.python-requests.org/) -- it handles this sort of thing for you automagically. – offby1 Sep 02 '14 at 23:15
2

This isn’t a case of the built-in library functions needing to “know how” to deal with other functions. JSON is defined as a UTF-8 representation of objects, so it can’t magically decode bytes that it doesn’t know the encoding of. I do agree that `urlopen` ought to be able to decode the bytes itself since it knows the encoding. Anyway, I’ve posted the Python standard library solution as an answer — you can do streaming decoding of bytes using the `codecs` module. – jbg Sep 14 '14 at 01:41
1

@ThatAintWorking: I would disagree. While it is a pain in the neck to explicitly have to manage the difference between bytes and strings, it is a much greater pain to have the language make some implicit conversion for you. Implicit bytes <-> string conversions are a source of many bugs, and Python3 is very helpful in pointing out the pitfalls. But I agree the library has room for improvement in this area. – EvertW Jul 25 '17 at 14:24
@EvertW the failure, in my opinion, it forcing strings to be unicode in the first place. – ThatAintWorking Jul 25 '17 at 14:34
1

@ThatAintWorking: No, strings must be Unicode, if you want software that can be used in other places than the UK or USA. For decades we have suffered under the myoptic worldview of the ASCII committee. Python3 finally got it right. Might have something to do with Python originating in Europe... – EvertW Aug 01 '17 at 19:13

score 67 · Answer 3 · answered Aug 27 '15 at 12:55

67

I have come to opinion that the question is the best answer :)

import json
from urllib.request import urlopen

response = urlopen("site.com/api/foo/bar").read().decode('utf8')
obj = json.loads(response)

answered Aug 27 '15 at 12:55

SergO

2,703
1
30
23

score 20 · Answer 4 · answered Oct 13 '16 at 18:06

20

For anyone else trying to solve this using the requests library:

import json
import requests

r = requests.get('http://localhost/index.json')
r.raise_for_status()
# works for Python2 and Python3
json.loads(r.content.decode('utf-8'))

answered Oct 13 '16 at 18:06

Luke Yeager

1,400
1
17
30

12

This functionality is built-in to `requests`: you can simply do `r.json()` – jbg Dec 14 '16 at 13:36
1

The clarify, if you use @jbg's method, you don't need to do `json.loads`. All you have to do is `r.json()` and you've got your JSON object loaded into a dict already. – Blairg23 Jun 05 '17 at 03:20
`*** UnicodeEncodeError: 'ascii' codec can't encode characters in position 264-265: ordinal not in range(128)` – andilabs Mar 06 '18 at 13:04

score 14 · Answer 5 · edited Jun 20 '17 at 03:47

14

This one works for me, I used 'request' library with json() check out the doc in requests for humans

import requests

url = 'here goes your url'

obj = requests.get(url).json()

edited Jun 20 '17 at 03:47

julian salas

3,714
1
19
20

answered Jun 13 '17 at 04:36

Sarthak Gupta

824
12
23

This is the best way. Really readable, and anyone who is doing something like this should have requests. – Baldrickk Sep 25 '19 at 15:13

score 7 · Answer 6 · answered Jul 12 '17 at 01:19

I ran into similar problems using Python 3.4.3 & 3.5.2 and Django 1.11.3. However, when I upgraded to Python 3.6.1 the problems went away.

You can read more about it here: https://docs.python.org/3/whatsnew/3.6.html#json

If you're not tied to a specific version of Python, just consider upgrading to 3.6 or later.

score 4 · Answer 7 · answered Sep 25 '19 at 14:57

As of Python 3.6, you can use json.loads() to deserialize a bytesobject directly (the encoding must be UTF-8, UTF-16 or UTF-32). So, using only modules from the standard library, you can do:

import json
from urllib import request

response = request.urlopen(url).read()
data = json.loads(response)

score 3 · Answer 8 · answered Dec 27 '16 at 11:17

3

If you're experiencing this issue whilst using the flask microframework, then you can just do:

data = json.loads(response.get_data(as_text=True))

From the docs: "If as_text is set to True the return value will be a decoded unicode string"

answered Dec 27 '16 at 11:17

cs_stackX

1,407
2
19
27

I got to this page because I was having an issue with Flask unit tests - thanks for posting the single line call. – sfblackl Apr 30 '17 at 18:57

score 2 · Answer 9 · edited Mar 13 '23 at 18:17

2

This will stream the byte data into json.

import io

obj = json.load(io.TextIOWrapper(response))

io.TextIOWrapper is preferred to the codecs module reader. https://www.python.org/dev/peps/pep-0400/

edited Mar 13 '23 at 18:17

Eugene Yarmash

142,882
41
325
378

answered Feb 28 '18 at 20:30

Collin Anderson

14,787
6
68
57

`*** AttributeError: 'Response' object has no attribute 'readable'`` – andilabs Mar 06 '18 at 13:01
*** AttributeError: 'bytes' object has no attribute 'readable' – andilabs Mar 06 '18 at 13:01
Are you using urllib or requests? This is for urllib. If you have a bytes object, just use `json.loads(bytes_obj.decode())`. – Collin Anderson Mar 06 '18 at 16:38

score 1 · Answer 10 · edited Mar 13 '23 at 07:58

Just found this simple method to return HttpResponse content as a json:

import json

request = RequestFactory() # ignore this, this just like your request object

response = MyView.as_view()(request) # got response as HttpResponse object

response.render() # call this so we could call response.content after

json_response = json.loads(response.content.decode('utf-8'))

print(json_response) # {"your_json_key": "your json value"}

Hope that helps you.

score 1 · Answer 11 · answered Dec 09 '17 at 17:21

1

Your workaround actually just saved me. I was having a lot of problems processing the request using the Falcon framework. This worked for me. req being the request form curl pr httpie

json.loads(req.stream.read().decode('utf-8'))

answered Dec 09 '17 at 17:21

thielyrics

39
4

score -1 · Answer 12 · edited May 21 '19 at 15:57

I used below program to use of json.loads()

import urllib.request
import json
endpoint = 'https://maps.googleapis.com/maps/api/directions/json?'
api_key = 'AIzaSyABbKiwfzv9vLBR_kCuhO7w13Kseu68lr0'
origin = input('where are you ?').replace(' ','+')
destination = input('where do u want to go').replace(' ','+')
nav_request = 'origin={}&destination={}&key={}'.format(origin,destination,api_key)
request = endpoint + nav_request
response = urllib.request.urlopen(request).read().decode('utf-8')
directions = json.loads(response)
print(directions)

Let JSON object accept bytes or let urlopen output strings

12 Answers12

Linked

Related