4

I've got some VERY large JSON files (the file sizes range from a few MB up to dozens of GB) and need to validate them. By validating I mean that I want to know if these files are indeed valid JSON files or if they contain and kind of syntax errors.

A common method is to use json.load(file) or json.loads(file) and check whether a ValueError is being raised, however, for files that big it takes forever to parse the JSON to a Python object - also, it takes a significant amount of RAM. I thought about using some kind of regular expression (or a similar approach), but since JSON isn't a regular language, I bet that won't work.

So, is there any way to efficiently validate the syntax of very large JSON files without having to load the whole file as an object?

EDIT: Similar questions are about reading such files, but I don't want to do that, at least not primarily. What I am looking for is a way just to validate their syntax.
The problem with ijson is that there's no detailed documentation except a few examples which aren't really helpful.

Nic
  • 148
  • 1
  • 8
  • Have you read: https://stackoverflow.com/questions/10382253/reading-rather-large-json-files-in-python/10382359#10382359 – Charles Landau Feb 05 '19 at 15:05
  • You cannot skip on the parsing itself - it's the only way to detect invalid json -, but you can use `ijson` (a streaming json parser) to avoid eating all your ram. – bruno desthuilliers Feb 05 '19 at 15:13

0 Answers0