2

I'm using ijson (https://pypi.python.org/pypi/ijson) to parse a large JSON file. It's several GBs, so I can't realistically store it all in memory. The issue is that somewhere in the middle of the file, the parser runs into an error (the specific exception is UnicodeDecodeError). I don't need every piece of data, so it's fine if I skip that entry, but I can't get it to continue past where the error is.

My code looks something like this:

parser = ijson.parse(file)
for prefix, event, value in parser:
    do stuff

If I try to catch the exception inside the loop, it won't catch it because it gets the error in the parsing. If I put it outside the loop, I can't continue in where I left off (as far as I know). How can I get around this error and keep going? Alternatively, how can I fix the file in a way that doesn't require opening it or storing it in memory?

nilypp
  • 21
  • 1
  • It seems that the problem is with ijson.. because the parser cannot parse the unicode and there is no trace in the docs for a way to pass the encoding as an argument as in json module. Did you try to parse the file with json? – Tim Givois Dec 16 '16 at 07:06
  • I'm using ijson because the file is too large to store in memory. Using json doesn't work for that reason, unless there's some way to parse line-by-line with json. – nilypp Dec 16 '16 at 07:38

0 Answers0