Request returns bytes and I'm failing to decode them

Question

Essentially I made a request to a website and got a byte response back: b'[{"geonameId:"703448"}..........'. I'm confused because although it is of type byte, it is very human readable and appears like a list of json. I do know that the response is encoded in latin1 from running r.encoding which returned ISO-859-1 and I have tried to decode it, but it just returns an empty string. Here's what I have so far:

r = response.content
string = r.decode("ISO-8859-1")
print (string)

and this is where it prints a blank line. However when I run

len(string)

I get: back 31023 How can I decode these bytes without getting back an empty string?

in python 2.x the b prefix will cause the enclosed string to become a type `str` you may have some encoded characters already hidden somewhere within. On Python 3.x you will receive a `bytes` literal. why do you believe you need to perform any encoding/decoding? — Mike McMahon, Jul 29 '15 at 18:44
Because I need to parse the json, and I just tried looping over it: with `for i in range(len(contents)): print content[i]` and it's just printing out lots of numbers. — koda gates, Jul 29 '15 at 18:50

score 37 · Accepted Answer · answered Jul 29 '15 at 19:03

37

Did you try to parse it with the json module?

import json
parsed = json.loads(response.content)

answered Jul 29 '15 at 19:03

mzc

3,265
1
20
25

7

Yes and I got: `JSON object must be str, not 'bytes'` – koda gates Jul 29 '15 at 19:04
4

And when you do `json.loads(response.content.decode('latin1'))`? – mzc Jul 29 '15 at 19:14
There should be a header in the response object telling you what encoding it has. You should decode the content with that codec, otherwise any unusual characters (emoji, accents, some quote characters, ...) will end up garbled. See the Answer from @salah – drevicko Mar 01 '17 at 10:28
@mzc Please add the `content.decode` comment directly to the answer. – Martin Thoma Oct 10 '18 at 09:41
@mzc, decode('latin1') doesn’t work always, in case of the content-type is `text/html; charset=UTF-8`, it fails. – Anu Oct 14 '19 at 19:14

score 30 · Answer 2 · answered Dec 09 '16 at 20:12

30

Another solution is to use response.text, which returns the content in unicode

Type:        property
String form: <property object at 0x7f76f8c79db8>
Docstring:  
Content of the response, in unicode.

If Response.encoding is None, encoding will be guessed using
``chardet``.

The encoding of the response content is determined based solely on HTTP
headers, following RFC 2616 to the letter. If you can take advantage of
non-HTTP knowledge to make a better guess at the encoding, you should
set ``r.encoding`` appropriately before accessing this property.

answered Dec 09 '16 at 20:12

salah

439
4
7

4

This is a much better idea than the accepted answer, as it will use the appropriate encoding. – drevicko Mar 01 '17 at 10:31
1

Yes, this is what is suggested in the docs: http://docs.python-requests.org/en/master/user/quickstart/#response-content – Jérôme Dec 04 '17 at 12:01

score 13 · Answer 3 · answered Oct 10 '18 at 14:29

13

There is r.text and r.content. The first one is a string, the second one is bytes.

You want

import json

data = json.loads(r.text)

answered Oct 10 '18 at 14:29

Martin Thoma

124,992
159
614
958

I have a similar issue and when I use r.text, it's empty. – j_allen_morris May 23 '19 at 05:31
1

Then maybe the server does not return anything. – Martin Thoma May 23 '19 at 05:32
I'm requesting the source / HTML of a webpage (http://www.dockethound.com) and when I use r.content, it shows up. – j_allen_morris May 23 '19 at 05:45

KT12 · Answer 4 · 2022-08-11T02:51:33.660

3

I faced a similar issue using beautifulsoup4 and requests while scraping webpages, however both response.text and response.content looked like it was bytes.

The response headers included 'Content-Type': 'text/html; charset=UTF-8' encoding in the headers, also had this in the response headers - 'Content-Encoding': 'br'. It turns out I hadn't installed brotlipy in the environment and running pip install brotlipy fixed my issues. I thought chardet or cchardet would be enough, but the data needed to be correctly decompressed.

A similar issue was solved here in the same way, and linking to this answer since it didn't come up until I explicitly searched for brotli compression.

edited Aug 11 '22 at 02:51

answered Aug 10 '22 at 21:01

KT12

549
11
24

I had the same issue and installing brotlipy fixed the bytes response problem. – EMartins Jul 21 '23 at 19:45
And there was no need to parse the response with the json module. Just response.text worked. – EMartins Jul 21 '23 at 20:02

Request returns bytes and I'm failing to decode them

4 Answers4

Linked