4

I have to parse the following JSON string:

{"JobDescription":"{\"project\": \"1322\", \"vault\": \"qa-122\"}"}'

If I try to use json.loads, I get the following:

>>> import json
>>> print json.loads('{"JobDescription":"{\"project\": \"1322\", \"vault\": \"qa-122\"}"}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
 File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
 File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 22 (char 21)

I don't have any control over the string I receive as its generated by another system.

  • There are two issues here, fundamentally: one of them is about having correct data to parse (i.e. why the initial attempt at parsing didn't work), and the other is about what to do in order to get the "embedded" JSON after initial parsing. The first one is an extremely common problem that doesn't have a good canonical yet (this really needs to be designed top-down and I am working on something). The second is trivial: "take the string out of the parsed result, and do the same thing again". – Karl Knechtel Jan 19 '23 at 01:21
  • As such, I closed this as a duplicate of a canonical for parsing JSON, and will add an answer there to clarify the embedded-JSON case (it seems to confuse a lot of people, but is **not actually a new problem**). I will later add a duplicate link for a proper canonical about the representation of data in source code vs. in a file. – Karl Knechtel Jan 19 '23 at 01:22

1 Answers1

6

You are not producing embedded backslashes; Python is interpreting the \" as an escaped quote and the final string just contains the quote:

>>> '{"JobDescription":"{\"project\": \"1322\", \"vault\": \"qa-122\"}"}'
'{"JobDescription":"{"project": "1322", "vault": "qa-122"}"}'

Use a raw string or double the slashes:

>>> r'{"JobDescription":"{\"project\": \"1322\", \"vault\": \"qa-122\"}"}'
'{"JobDescription":"{\\"project\\": \\"1322\\", \\"vault\\": \\"qa-122\\"}"}'
>>> '{"JobDescription":"{\\"project\\": \\"1322\\", \\"vault\\": \\"qa-122\\"}"}'
'{"JobDescription":"{\\"project\\": \\"1322\\", \\"vault\\": \\"qa-122\\"}"}'

This then loads fine:

>>> import json
>>> json.loads('{"JobDescription":"{\\"project\\": \\"1322\\", \\"vault\\": \\"qa-122\\"}"}')
{'JobDescription': '{"project": "1322", "vault": "qa-122"}'}

and you can decode the nested JSON document from there:

>>> decoded = json.loads('{"JobDescription":"{\\"project\\": \\"1322\\", \\"vault\\": \\"qa-122\\"}"}')
>>> json.loads(decoded['JobDescription'])
{'project': '1322', 'vault': 'qa-122'}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343