1

I have an application that writes a large array of big-but-not-huge JSON objects to an output file. The objects are created, written, and then discarded (i.e. I don't keep them all around). I am using JSONStream in an attempt to make memory usage a non-issue, but it isn't working.

Here is a simple example that shows the issue I'm having:

let fs = require('fs');
let JSONStream = require('JSONStream');

const testfile = 'testfile.json';
const entcount = 70000;
const hacount = 10*1024;

console.log(`opening ${testfile}`);
let outputTransform = JSONStream.stringify();
let outputStream = fs.createWriteStream(testfile);
outputStream.on('finish', () => console.log('finished'));
outputTransform.pipe(outputStream);

console.log(`writing ${entcount} entries, data size ${hacount*2}`);
for (let n = 0; n < entcount; ++ n) {
    let thing = {
        index: n,
        data: 'ha'.repeat(hacount)
    }
    outputTransform.write(thing);
}

console.log('finishing');
outputTransform.end();

This example uses JSONStream to stream 70000 objects, each roughly 20kB, to a file (this is in the ballpark of my actual application). However, it runs out of memory around 45000 (full output at end of post):

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaS
cript heap out of memory
 1: 0092D8BA v8::internal::Heap::PageFlagsAreConsistent+3050

Also I've noticed that as I'm calling outputTransform.write, the file size stays at 0 (it's also 0 after the above OOM). It doesn't start growing until outputTransform.end is called. So I'm assuming the output data is being buffered somewhere and is eating up the heap.

The behavior I expected was that outputTransform.write would cause output to be written immediately, or at least buffered in a manageably sized buffer; and so I can write as many objects I want without worry about an OOM.

So my question is, is there some way to get JSONStream to not hold all this data in memory?

Increasing the heap size isn't really an option, because it still presents a memory-bound upper limit.


Full output:

$ node index.js

opening testfile.json
writing 70000 entries, data size 20480

<--- Last few GCs --->

[22256:022DA970]     4589 ms: Mark-sweep 918.8 (924.6) -> 918.3 (921.9) MB, 30.6
 / 0.0 ms  (+ 69.8 ms in 33 steps since start of marking, biggest step 9.7 ms, w
alltime since start of marking 104 ms) (average mu = 0.116, current mu = 0.082)
finalize increm[22256:022DA970]     4593 ms: Scavenge 920.2 (921.9) -> 918.1 (92
6.1) MB, 2.3 / 0.0 ms  (average mu = 0.116, current mu = 0.082) allocation failu
re

<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 00DDCB97]
Security context: 0x1c200469 <JSObject>
    1: /* anonymous */ [0D080429] [index.js:~1] 
[pc=203C4C90](this=0x0d0804c5 <Object map = 1ED0021D>,0x0d0804c5 
<Object map = 1ED0021D>,0x0d0804a5 <JSFunction require (sfi = 3025687D)>,
0x0d08045d <Module map = 1ED204A5>,0x3024f3c1 <String[#59]: index.js>,0x0d080449 
<...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaS
cript heap out of memory
 1: 0092D8BA v8::internal::Heap::PageFlagsAreConsistent+3050
Jason C
  • 38,729
  • 14
  • 126
  • 182
  • 1
    Probably related: https://stackoverflow.com/questions/43643467/how-to-write-incrementally-to-a-text-file-and-flush-output – Tomalak Feb 05 '22 at 08:50
  • 1
    @Tomalak Thanks; super useful. Experimenting trying to figure out how to work sync/drain/flush/etc. into my current implementation (which has some added complexity of promises), and debating whether or not to just write the objects directly and ditch JSONStream. Also I'm not very experienced with Node. But I think I'm on the right track now, thanks. – Jason C Feb 05 '22 at 09:08
  • Yeah... I think I might not use JSONStream. The Stream returned by `JSONStream.stringify()` is created by [though](https://www.npmjs.com/package/through), and looking through this code it hides a few things (e.g. the return value of `write`) and the stream interface in general is making my head spin. So I'm going to try to work with the file interface instead, and keep everything synchronized so the object source slows down to match the output write rate. I can tell from the API docs there's more elegant / performant ways to do this but, yeah. I just need it to work. – Jason C Feb 05 '22 at 09:27
  • 1
    I agree, the stream interface is quite something. I would probably ditch JSONstream as well and just write strings to a text file. – Tomalak Feb 05 '22 at 09:30
  • Got it working just writing to a text file, and brute forcing everything into submission with `p-limit` and sloppy promises. Normally I would post an answer, but I don't want the world to see this abomination. Thanks for your help. – Jason C Feb 05 '22 at 10:00
  • 1
    There might be coming something good from keeping this question open. Maybe add that async complication to your initial code, and wait what people who are not intimidated by node streams come up with. – Tomalak Feb 05 '22 at 10:03

0 Answers0