28

I have a large json file, its is Newline-delimited JSON, where multiple standard JSON objects are delimited by extra newlines, e.g.

{'name':'1','age':5}
{'name':'2','age':3}
{'name':'3','age':6}

I am now using JSONStream in node.js to parse a large json file, the reason I use JSONStream is because it is based on stream.

However,both parse syntax in the example can't help me to parse this json file with separated JSON in each line

var parser = JSONStream.parse(**['rows', true]**);
var parser = JSONStream.parse([**/./**]);

Can someone help me with that

peak
  • 105,803
  • 17
  • 152
  • 177
user824624
  • 7,077
  • 27
  • 106
  • 183

5 Answers5

22

Warning: Since this answer was written, the author of the JSONStream library removed the emit root event functionality, apparently to fix a memory leak. Future users of this library, you can use the 0.x.x versions if you need the emit root functionality.

Below is the unmodified original answer:

From the readme:

JSONStream.parse(path)

path should be an array of property names, RegExps, booleans, and/or functions. Any object that matches the path will be emitted as 'data'.

A 'root' event is emitted when all data has been received. The 'root' event passes the root object & the count of matched objects.

In your case, since you want to get back the JSON objects as opposed to specific properties, you will be using the 'root' event and you don't need to specify a path.

Your code might look something like this:

var fs = require('fs'),
    JSONStream = require('JSONStream');

var stream = fs.createReadStream('data.json', {encoding: 'utf8'}),
    parser = JSONStream.parse();

stream.pipe(parser);

parser.on('root', function (obj) {
  console.log(obj); // whatever you will do with each JSON object
});
Community
  • 1
  • 1
frangio
  • 855
  • 7
  • 18
  • 2
    Note the typo at the end of the var line - the ';' should be a ','. – Erwin Wessels Jan 23 '14 at 11:43
  • @frangio Please can you clarify the usage if the requirement is to read the large file, as per the OPs question, but pass the Objectified stream to the next Transform in the pipe directly. Eg. I would like stream.pipe(parser).pipe(MyNextTransform) where MyNextTransform can work with objects it receives as param in the _transform() method. In other words, want the output of parser.on('root') to be delegating onto another readable stream for further pipeline processing. – arcseldon Jul 12 '14 at 05:32
  • 1
    Nevermind, I worked it out :) just have to return stream.pipe(parser), the next transform in the chain will automatically be given the results of the parser.on('root') call. – arcseldon Jul 12 '14 at 05:44
4

JSONstream is intended for parsing a single huge JSON object, not many JSON objects. You want to split the stream at newlines, then parse them as JSON.

The NPM package split claims to do this splitting, and even has a feature to parse the JSON lines for you.

rjmunro
  • 27,203
  • 20
  • 110
  • 132
  • I found the split package very useful, and as a matter of fact, using split+JSON.parse() outperformed JSONStream. – FedFranz Apr 05 '19 at 10:52
1

If your file is not enough large here is an easy, but not performant solution:

const fs = require('fs');

let rawdata = fs.readFileSync('fileName.json');

let convertedData = String(rawdata)
    .replace(/\n/gi, ',')
    .slice(0, -1);

let JsonData= JSON.parse(`[${convertedData}]`); 
Patryk Janik
  • 2,476
  • 2
  • 13
  • 23
1

I created a package @jsonlines/core which parses jsonlines as object stream.

You can try the following code:

npm install @jsonlines/core
const fs = require("fs");
const { parse } = require("@jsonlines/core");

// create a duplex stream which parse input as lines of json
const parseStream = parse();

// read from the file and pipe into the parseStream
fs.createReadStream(yourLargeJsonLinesFilePath).pipe(parseStream);

// consume the parsed objects by listening to data event
parseStream.on("data", (value) => {
  console.log(value);
});

Note that parseStream is a standard node duplex stream. So you can also use for await ... of or other ways to consume it.

Equal
  • 354
  • 3
  • 9
1

Here's another solution for when the file is small enough to fit into memory. It reads the whole file in one go, converts it into an array by splitting it at the newlines (removing the blank line at the end), and then parses each line.

import fs from "fs";

const parsed = fs
  .readFileSync(`data.jsonl`, `utf8`)
  .split(`\n`)
  .slice(0, -1)
  .map(JSON.parse)