I'm trying to parse a GitHub archive file with yajl-py. I believe the basic format of the file is a stream of JSON objects, so the file itself is not valid JSON, but it contains objects which are.
To test this out, I installed yajl-py
and then used their example parser (from https://github.com/pykler/yajl-py/blob/master/examples/yajl_py_example.py) to try to parse a file:
python yajl_py_example.py < 2012-03-12-0.json
where 2012-03-12-0.json
is one of the GitHub archive files that's been decompressed.
It appears this sort of thing should work from their reference implementation in Ruby. Do the Python packages not handle JSON streams?
By the way, here's the error I get:
yajl.yajl_common.YajlError: parse error: trailing garbage
9478bbc3","type":"PushEvent"}{"repository":{"url":"https://g
(right here) ------^