1

I'm receiving a json file that begins with : 'b\'{ "key" : .... I'm attempting to remove the 'b\' part of the string as it's not valid json.

The json is read using :

import urllib.request
link = "http://www...."
with urllib.request.urlopen(link) as url:
    s = str(url.read())

My code to replace is : replace('\'b\'', '') but the string 'b\'{ "key" : .... remains instead of { "key" : ....

Attempting to recreate the issue excluding the json string :

mystr = ' b\'{  '
mystr.replace(' b\'{ ', '') 

successfully replaces as ' ' is the output.

blue-sky
  • 51,962
  • 152
  • 427
  • 752
  • use json_loads() – N. Ivanov Sep 12 '17 at 15:03
  • 2
    How did you produce this in the first place? Someone has gone `str(binary_result)` somewhere, and they **should not be doing that to begin with**. There are potentially other problems too, like double-escaping of single quotes in the JSON data. – Martijn Pieters Sep 12 '17 at 15:28
  • 1
    Ah, I see, *you* did, by using `str(url.read())`. **Decode** binary data, don't produce a representation. `s = url.read().decode('utf8')`. – Martijn Pieters Sep 12 '17 at 15:29
  • Or rather, get the right content type from the response headers; UTF-8 is often a good assumption but not always. See the duplicate. – Martijn Pieters Sep 12 '17 at 15:30

1 Answers1

9

You yourself are adding that b by calling str() on the data you get. Just don't do that.

If you do actually need to convert it to a string, you should decode it instead:

s = url.read().decode('utf-8')

but in fact you can almost certainly pass the bytestring directly to json.loads().

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
  • 1
    Instead of hard-conding `utf-8`, use `url.info().get_content_charset('utf-8')`; this takes the character set from the Content-Type header, defaulting to UTF-8 if not explicitly specified. – Martijn Pieters Sep 12 '17 at 15:31