0

I have the following string I need to deserialize:

{
  "bw": 20,
  "center_freq": 2437,
  "channel": 6,
  "essid": "DIRECT-sB47" Philips 6198",
  "freq": 2437
}

This is almost a correct JSON, except for the quote in the value DIRECT-sB47" Philips 6198 which prematurely ends the string, breaking the rest of the JSON.

Is there a way to deserialize elements which have the pattern

"key": "something which includes a quote",

or should I try to first pre-process the string with a regex to remove that quote (I do not care about it, nor about any other weird characters in the keys or values)?

UPDATE: sorry for not posting the code (it is a standard deserialization via json). The code is also available at repl.it

import json

data = '''
{
  "bw": 20,
  "center_freq": 2437,
  "channel": 6,
  "essid": "DIRECT-sB47" Philips 6198",
  "freq": 2437
}
'''
trans = json.loads(data)
print(trans)

The traceback:

Traceback (most recent call last):
  File "main.py", line 12, in <module>
    trans = json.loads(data)
  File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.6/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 6 column 26 (char 79)

The same code without the quote works fine

import json

data = '''
{
  "bw": 20,
  "center_freq": 2437,
  "channel": 6,
  "essid": "DIRECT-sB47 Philips 6198",
  "freq": 2437
}
'''
trans = json.loads(data)
print(trans)

COMMENT: I realize that the provider of the JSON should fix their code (I opened a bug report with them). In the meantime, until the bug is fixed (if it is) I would like to try a workaround.

WoJ
  • 27,165
  • 48
  • 180
  • 345
  • 3
    Can you post your serialization code? The `json` module should take care of this for you. – Harry Cutts Dec 04 '18 at 22:40
  • Use a json library don't do it yourself. – Ben Dec 04 '18 at 22:41
  • Very similar to this: https://stackoverflow.com/questions/18514910/how-do-i-automatically-fix-an-invalid-json-string – shyam padia Dec 04 '18 at 22:45
  • @HarryCutts: sorry, updated the question – WoJ Dec 04 '18 at 22:48
  • @Ben: sorry, updated the question – WoJ Dec 04 '18 at 22:48
  • 1) this is not serializing, this is **deserializing**. 2) it's invalid Json that's why you are getting an error. Where did you get this invalid json from? – Ben Dec 04 '18 at 22:54
  • @Ben: 1) thanks, corrected, 2) from an access point (Ubiquity to be precise). I just filed a bug with them but fixing this will take time (the fix is simple, the whole deployment will probably take time) – WoJ Dec 04 '18 at 23:08
  • I'd go with a regex. First try to deserialize in the usual way, if that fails, look for known problems (probably just this field) so you can search for `"essid": "` followed by anything up to `",`, fixup any embedded quotes, and try again a second time. Using a regex to fixup known issues in the incoming string if (and only if) it fails, but use a library to do the actual deserialization. – Ben Dec 04 '18 at 23:15
  • Is the new-line separation regular between each field-value in the JSON file? – Oluwafemi Sule Dec 04 '18 at 23:16
  • 1
    @Ben: this is what I initailly had in mind, and then thought about using the exception value which is similar to `Expecting ',' delimiter: line 6 column 26 (char 79)`, a `\.*\(char (\d+)` should catch the place of the character to remove. All this running in a loop to successively catch such escaping errors, until I deserialize correctly. – WoJ Dec 04 '18 at 23:19
  • @OluwafemiSule: no, the string is quite messy and I cannot assume much about it. – WoJ Dec 04 '18 at 23:19
  • There is a big risk here that anyone who controls the essid value can inject json into your system e.g. by setting essid to `test ", "extra": "Extra Value` they can generate json `"essid": "test ", "extra": "Extra Value",` so I would recommend AGAINST a fully general solution, and make a solution which ONLY works on the actual problem you face, and remove that as soon as you conceivably can. – Ben Dec 04 '18 at 23:24
  • Also log everything and automatically raise tickets to have the offending devices essid updated!!!! – Ben Dec 04 '18 at 23:27
  • @Ben: yes you are right, there is a risk of injection but in my specific case I am looking at values in the JSON which are not the ESSID (but something totally unrelated elsewhere, which I control). I just want to get rid of that breaking error to access them. – WoJ Dec 04 '18 at 23:28
  • If you control them then get them under control! Log them, read them manually and change them to something not containing a quote. – Ben Dec 04 '18 at 23:29
  • @Ben: I am not sure I understand what you are saying. These are ESSIDs which I happen to see around. I have no interest in them, they are just ni the "JSON" file I get from an AP (and I need other information from that JSON). There is nothing to log, control or change. – WoJ Dec 04 '18 at 23:31
  • You said you controlled the Json. I took that to mean that you could control what values were produced, therefore you could log which devices were producing invalid values, and have them reconfigured. – Ben Dec 05 '18 at 09:40

1 Answers1

0

I ended up analyzing the exception which includes the place of the faulty character, removing it and deserializing again (in a loop).

Worst case the whole data string is swallowed, which in my case is better than crashing.

import json
import re

data = '''
{
  "bw": 20,
  "center_freq": 2437,
  "channel": 6,
  "essid": "DIRECT-sB47" Philips 6198",
  "freq": 2437
}
'''
while True:
    try:
        trans = json.loads(data)
    except json.decoder.JSONDecodeError as e:
        s = int(re.search(r"\.*char (\d+)", str(e)).group(1))-2
        print(f"incorrect character at position {s}, removing")
        data = data[:s] + data[(s + 1):]
    else:
        break

print(trans)
WoJ
  • 27,165
  • 48
  • 180
  • 345