I'm working with WikiData (a cross-referencing of multiple data sources, including Wikipedia) and they provide a ~50 GB JSON file with no white space. I want to extract certain kinds of data from it, which I could do with grep if it was pretty printed. I'm running on a mac.
Some methods of reformatting, e.g.,
cat ... | python -m json.too
./jq . filename.json
Will not work on a large file. python chokes. jq dies. There was a great thread here: How can I pretty-print JSON in (unix) shell script? But I'm not sure how/if any can deal with large files.
This company uses "Akka streams" to do this very task (they claim <10 minutes to process all Wikidata), but I know nothing about it: http://engineering.intenthq.com/2015/06/wikidata-akka-streams/
Wikidata has a predictable format (https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON), and I am able to accomplish most of my goal by piping through a series of sed and tr, but it's clumsy and potentially error-prone, and I'd much prefer to be grepping on a prettyprint.
Any suggestions?