I have a stream of strings where I need to analyze each one and check whether it is a valid JSON. The pythonic way (EAFP) dictates something like this:
import json
def parse_json(string):
try:
return json.loads(string)
except:
return string
The problem is that a significant number of strings are not JSONs, and the many exceptions raised by this code slow the process quite a bit.
I am looking for some way to try and parse the text as JSON, returning some kind of pre-defined value (e.g. an empty tuple()
) indicating the string is not JSON compatible.
I don't mind hacking around the standard json package (overriding a function or two..) if this is the easiest solution.
Any suggestions?
Update: As I'm only interested in 'complex' JSONs (arrays and objects), I've eventually decided to go with a simple if to check the first and last characters of the string:
try:
import ujson as json
except ImportError:
import json
def parse_json(string):
if len(text) > 0:
text = text.strip()
if text != "" and ((text[0] == "{" and text[-1] == "}") or (text[0] == "[" and text[-1] == "]")):
try:
return json.loads(string)
except:
return string
ujson is a much more efficient implementation than Python's standard json. Additionally, skipping all strings which are not wrapped with [] or {} reduces the amount of exceptions by a large factor. It turns out that mixing LBYL and EAFP was what I needed.