There are multiple questions related to deserializing JSON containing embedded quotes but don't see a python-specific solution to this:
Given log data that is only partially valid JSON, eg:
"{"link":"<a href="mylink">http://my.com</a>"}"
The inner quotes eg around "mylink" interfere with the outer quotes around individual key-value pairs.
Unescaped, these cause json.loads
and ast.literal_eval
(see here) to throw syntax error.
On the other hand, to hunt and escape inner quotes via regex is tricky because of the variable nested JSON structure (the above is just a minimal example) and the key-values are open ended with no available schema.
Any alternatives?