Replace \b in json

Question

I'm receiving a json file that begins with : 'b\'{ "key" : .... I'm attempting to remove the 'b\' part of the string as it's not valid json.

The json is read using :

import urllib.request
link = "http://www...."
with urllib.request.urlopen(link) as url:
    s = str(url.read())

My code to replace is : replace('\'b\'', '') but the string 'b\'{ "key" : .... remains instead of { "key" : ....

Attempting to recreate the issue excluding the json string :

mystr = ' b\'{  '
mystr.replace(' b\'{ ', '')

successfully replaces as ' ' is the output.

How did you produce this in the first place? Someone has gone `str(binary_result)` somewhere, and they **should not be doing that to begin with**. There are potentially other problems too, like double-escaping of single quotes in the JSON data. — Martijn Pieters, Sep 12 '17 at 15:28
Ah, I see, *you* did, by using `str(url.read())`. **Decode** binary data, don't produce a representation. `s = url.read().decode('utf8')`. — Martijn Pieters, Sep 12 '17 at 15:29
Or rather, get the right content type from the response headers; UTF-8 is often a good assumption but not always. See the duplicate. — Martijn Pieters, Sep 12 '17 at 15:30

score 9 · Accepted Answer · answered Sep 12 '17 at 15:03

9

You yourself are adding that b by calling str() on the data you get. Just don't do that.

If you do actually need to convert it to a string, you should decode it instead:

s = url.read().decode('utf-8')

but in fact you can almost certainly pass the bytestring directly to json.loads().

answered Sep 12 '17 at 15:03

Daniel Roseman

1

Instead of hard-conding `utf-8`, use `url.info().get_content_charset('utf-8')`; this takes the character set from the Content-Type header, defaulting to UTF-8 if not explicitly specified. – Martijn Pieters Sep 12 '17 at 15:31

1 Answers1