I've got some VERY large JSON files (the file sizes range from a few MB up to dozens of GB) and need to validate them. By validating I mean that I want to know if these files are indeed valid JSON files or if they contain and kind of syntax errors.
A common method is to use json.load(file)
or json.loads(file)
and check whether a ValueError is being raised, however, for files that big it takes forever to parse the JSON to a Python object - also, it takes a significant amount of RAM. I thought about using some kind of regular expression (or a similar approach), but since JSON isn't a regular language, I bet that won't work.
So, is there any way to efficiently validate the syntax of very large JSON files without having to load the whole file as an object?
EDIT: Similar questions are about reading such files, but I don't want to do that, at least not primarily. What I am looking for is a way just to validate their syntax.
The problem with ijson is that there's no detailed documentation except a few examples which aren't really helpful.