I wish to manipulate a standard JSON object to an object where each line must contain a separate, self-contained valid JSON object. See JSON Lines
JSON_file =
[{u'index': 1,
u'no': 'A',
u'met': u'1043205'},
{u'index': 2,
u'no': 'B',
u'met': u'000031043206'},
{u'index': 3,
u'no': 'C',
u'met': u'0031043207'}]
To JSONL
:
{u'index': 1, u'no': 'A', u'met': u'1043205'}
{u'index': 2, u'no': 'B', u'met': u'031043206'}
{u'index': 3, u'no': 'C', u'met': u'0031043207'}
My current solution is to read the JSON file as a text file and remove the [
from the beginning and the ]
from the end. Thus, creating a valid JSON object on each line, rather than a nested object containing lines.
I wonder if there is a more elegant solution? I suspect something could go wrong using string manipulation on the file.
The motivation is to read json
files into RDD on Spark. See related question - Reading JSON with Apache Spark - `corrupt_record`