Probably the "nicest" way to do this is to let a real JSON decoder do the work, not using horrible regexes. Find all open braces as "possible object start points", then try to parse them with JSONDecoder
's raw_decode
method (which returns the object parsed and number of characters consumed on success making it possible to skip successfully parsed objects efficiently). For example:
import json
def get_all_json(teststr):
decoder = json.JSONDecoder()
# Find first possible JSON object start point
sliceat = teststr.find('{')
while sliceat != -1:
# Slice off the non-object prefix
teststr = teststr[sliceat:]
try:
# See if we can parse it as a JSON object
obj, consumed = decoder.raw_decode(teststr)
except Exception:
# If we couldn't, find the next open brace to try again
sliceat = teststr.find('{', 1)
else:
# If we could, yield the parsed object and skip the text it was parsed from
yield obj
sliceat = consumed
This is a generator function, so you can either iterate the objects one by one e.g. for obj in get_all_json(mystr):
or if you need them all at once for indexing, iterating multiple times or the like, all_objs = list(get_all_json(mystr))
.