56

I have the following JSON string coming from external input source:

{value: "82363549923gnyh49c9djl239pjm01223", id: 17893}

This is an incorrectly-formatted JSON string ("id" and "value" must be in quotes), but I need to parse it anyway. I have tried simplejson and json-py and seems they could not be set up to parse such strings.

I am running Python 2.5 on Google App engine, so any C-based solutions like python-cjson are not applicable.

Input format could be changed to XML or YAML, in addition to JSON listed above, but I am using JSON within the project and changing format in specific place would not be very good.

Now I've switched to XML and parsing the data successfully, but looking forward to any solution that would allow me to switch back to JSON.

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
Serge Tarkovski
  • 1,861
  • 6
  • 19
  • 24
  • I'm a little confused about how you can switch to XML, yet not be in control of the JSON data. It sounds like you have some external source of data, in either XML or JSON formats, but its JSON output is permanently broken as shown and you can't do anything about it so your only option is to select the XML version instead? Or am I missing something? – Peter Hansen Dec 19 '09 at 00:35
  • you can parse it as YAML without a change, because it is YAML too – mykhal Dec 19 '09 at 00:43
  • Peter, you're right - I have an external source of data which I could control only in one way - by saying I want an input in either JSON, XML or YAML. Nadia, thanks - that's my mistake (and due to I am not very familiar with Stackoverflow's interface at the time). – Serge Tarkovski Dec 19 '09 at 09:20

5 Answers5

65

since YAML (>=1.2) is a superset of JSON, you can do:

>>> import yaml
>>> s = '{value: "82363549923gnyh49c9djl239pjm01223", id: 17893}'
>>> yaml.load(s)
{'id': 17893, 'value': '82363549923gnyh49c9djl239pjm01223'}
mykhal
  • 19,175
  • 11
  • 72
  • 80
  • 1
    well, python-yaml (PyYAML) is not yet fully 1.2 compliant, but will handle most cases. to be prepared for problem cases, see http://en.wikipedia.org/wiki/YAML#cite_ref-6 – mykhal Dec 19 '09 at 00:56
  • mykhal, have you run it on Google App Engine? Seems PyYAML uses C modules and thus cannot be used on GAE. – Serge Tarkovski Dec 19 '09 at 09:31
  • pyyaml is much faster, if using libyaml, but it also is written in pure python, and you can choose between CLoader o Loader (pure py). But don't worry, yaml support is already included in app engine, you can try this in interactive shell http://shell.appspot.com/ – mykhal Dec 19 '09 at 15:30
  • YAML is not a strict superset of JSON as YAML requires the mapping keys to be unique while JSON only suggests to use unique keys (MUST vs. SHOULD). – Gumbo Dec 19 '09 at 17:59
  • 8
    One more problem: YAML apparently requires a space after the colon. However for the most part this works like a charm. – Adam Ernst Mar 24 '10 at 05:53
  • FYI (@SergeTarkovski and others) [YAML 3.10 is now included with Python 2.7](https://developers.google.com/appengine/docs/python/tools/libraries27) – Brian M. Hunt Feb 12 '13 at 00:32
  • I also had the colon - space issue. I tackled it using a regex replacement before the deserialization, but it can fails, or add spaces inside strings values, so take that in consideration before use it: `yaml.load(re.sub(r':(.+?)', r': \1', s))` – Mariano Ruiz Feb 27 '18 at 19:51
26

You can use demjson.

>>> import demjson
>>> demjson.decode('{foo:3}')
{u'foo': 3}
Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
null
  • 8,669
  • 16
  • 68
  • 98
  • That helped me to parse JSON without quotes and with formatting that differs from yaml – varela Oct 17 '15 at 15:43
  • 1
    very helpful package for parsing broken json, thanks – Johnner Oct 27 '15 at 12:36
  • handled nested objects as well which I found to be an issue with yaml. on windows: py -m pip install demjson ----------- then import demjson s = """get or define the multiline string inline""" j = demjson.decode(s) jsonString = demjson.encode(j) – msanjay Nov 14 '18 at 10:31
  • Probably the best python lib for parsing json without quotes, many thanks. – lenhhoxung Nov 27 '18 at 15:30
  • The original link is dead: deron.meranda.us/python/demjson/. I edited in the package's page in pypi instead. – Gino Mempin Apr 25 '22 at 08:32
2

You could use a string parser to fix it first, a regex could do it provided that this is as complicated as the JSON will get.

davidosomething
  • 3,379
  • 1
  • 26
  • 33
  • This is possible, but I am considering such type of solution as weird, so for now I am just looking for a json parsing library that could process this broken JSON. – Serge Tarkovski Dec 19 '09 at 09:38
2

The dirtyjson library can handle some almost-correct JSON:

>>> import dirtyjson
>>> 
>>> s = '{value: "82363549923gnyh49c9djl239pjm01223", id: 17893}'
>>> d = dirtyjson.loads(s)
>>> d
AttributedDict([('value', '82363549923gnyh49c9djl239pjm01223'), ('id', 17893)])
>>>
>>> d = dict(d)
>>> d
{'value': '82363549923gnyh49c9djl239pjm01223', 'id': 17893}
>>> d["value"]
'82363549923gnyh49c9djl239pjm01223'
>>> d["id"]
17893
Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
0

Pyparsing includes a JSON parser example, here is the online source. You could modify the definition of memberDef to allow a non-quoted string for the member name, and then you could use this to parser your not-quite-JSON source text.

[The August, 2008 issue of Python Magazine has a lot more detailed info about this parser. It shows some sample JSON, and code that accesses the parsed results like it was a deserialized object.

PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • Links are dead. – mcoolive Jan 31 '20 at 13:29
  • 2
    Thanks - I fixed the link to the parser to now point to that file in the GitHub repo. I had to drop the Python Magazine link, since there is no longer a public archive of the issues of this magazine. – PaulMcG Jan 31 '20 at 14:59