Node: fs write() doesn't write inside loop. Why not?

Question

I want to create a write stream and write to it as my data comes in. However, I am able to create the file but nothing is written to it. Eventually, the process runs out of memory.

The problem, I've discovered is that I'm calling write() whilst inside a loop.

Here's a simple example:

'use strict'

var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');

for (var i = 0; i < 10000000000; i++) {
    wstream.write(i+'\n');
}

console.log('End!')
wstream.end();

Nothing ever gets written, not even hello. But why? How can I write to the file within a loop?

Just a guess, but is it possible that writing hapens async (starting in the next tick), and you close (end) the stream before any of the writes can happen? — Balázs Édes, Jun 01 '16 at 22:17
The process runs out of memory before it evens hits `console.log('End!')` — user3574603, Jun 01 '16 at 22:18
Umm, you do know that 10 billion numbers and newlines === lots and lots of characters? Your code works for me looping up to 5,000,000 times. I didn't try beyond that because I like taking care of my computers. — bowheart, Jun 01 '16 at 22:23
You can check: [why does attempting to write large a large file cause js heap to run out of memory](https://stackoverflow.com/questions/50357777/why-does-attempting-to-write-large-a-large-file-cause-js-heap-to-run-out-of-memo?noredirect=1&lq=1) — Marcos Casagrande, Feb 01 '19 at 14:13

score 14 · Answer 1 · edited Aug 27 '17 at 09:59

14

The problem is that you aren't ever giving it a chance to drain the buffer. Eventually this buffer gets full and you run out of memory.

WriteStream.write returns a boolean value indicating if the data was successfully written to disk. If the data was not successfully written, you should wait for the drain event, which indicates the buffer has been drained.

Here's one way of writing your code which utilizes the return value of write and the drain event:

'use strict'

var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');

function writeToStream(i) {
  for (; i < 10000000000; i++) {
    if (!wstream.write(i + '\n')) {
      // Wait for it to drain then start writing data from where we left off
      wstream.once('drain', function() {
        writeToStream(i + 1);
      });
      return;
    }
  }
  console.log('End!')
  wstream.end();
}

writeToStream(0);

edited Aug 27 '17 at 09:59

TachyonVortex

8,242
3
48
63

answered Jun 01 '16 at 22:24

Mike Cluck

31,869
13
80
91

Hi, any idea how to implement the same using pipe()? – baza92 Mar 28 '19 at 19:13
@baza92 I'm not very familiar with `pipe` at this time. Sorry! – Mike Cluck Mar 29 '19 at 13:19
No worries, I was able to finally figure it out. I'm posting it here as supplement to your answer. Maybe it will be useful for somebody. – baza92 Mar 29 '19 at 22:33

TachyonVortex · Accepted Answer · 2017-08-27T14:52:33.250

To supplement @MikeC's excellent answer, here are some relevant details from the current docs (v8.4.0) for writable.write():

If false is returned, further attempts to write data to the stream should stop until the 'drain' event is emitted.

While a stream is not draining, calls to write() will buffer chunk, and return false. Once all currently buffered chunks are drained (accepted for delivery by the operating system), the 'drain' event will be emitted. It is recommended that once write() returns false, no more chunks be written until the 'drain' event is emitted. While calling write() on a stream that is not draining is allowed, Node.js will buffer all written chunks until maximum memory usage occurs, at which point it will abort unconditionally. Even before it aborts, high memory usage will cause poor garbage collector performance and high RSS (which is not typically released back to the system, even after the memory is no longer required).

and for backpressuring in streams:

In any scenario where the data buffer has exceeded the highWaterMark or the write queue is currently busy, .write() will return false.

When a false value is returned, the backpressure system kicks in.

Once the data buffer is emptied, a .drain() event will be emitted and resume the incoming data flow.

Once the queue is finished, backpressure will allow data to be sent again. The space in memory that was being used will free itself up and prepare for the next batch of data.

               +-------------------+         +=================+
               |  Writable Stream  +--------->  .write(chunk)  |
               +-------------------+         +=======+=========+
                                                     |
                                  +------------------v---------+
   +-> if (!chunk)                |    Is this chunk too big?  |
   |     emit .end();             |    Is the queue busy?      |
   +-> else                       +-------+----------------+---+
   |     emit .write();                   |                |
   ^                                   +--v---+        +---v---+
   ^-----------------------------------<  No  |        |  Yes  |
                                       +------+        +---v---+
                                                           |
           emit .pause();          +=================+     |
           ^-----------------------+  return false;  <-----+---+
                                   +=================+         |
                                                               |
when queue is empty     +============+                         |
^-----------------------<  Buffering |                         |
|                       |============|                         |
+> emit .drain();       |  ^Buffer^  |                         |
+> emit .resume();      +------------+                         |
                        |  ^Buffer^  |                         |
                        +------------+   add chunk to queue    |
                        |            <---^---------------------<
                        +============+

Here are some visualisations (running the script with a V8 heap memory size of 512MB by using --max-old-space-size=512).

This visualisation shows the heap memory usage (red) and delta time (purple) for every 10,000 steps of i (the X axis shows i):

'use strict'

var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');
var latestTime = (new Date()).getTime();
var currentTime;

for (var i = 0; i < 10000000000; i++) {
    wstream.write(i+'\n');
    if (i % 10000 === 0) {
        currentTime = (new Date()).getTime();
        console.log([  // Output CSV data for visualisation
            i,
            (currentTime - latestTime) / 5,
            process.memoryUsage().heapUsed / (1024 * 1024)
        ].join(','));
        latestTime = currentTime;
    }
}

console.log('End!')
wstream.end();

The script runs slower and slower as the memory usage approaches the maximum limit of 512MB, until it finally crashes when the limit is reached.

This visualisation uses v8.setFlagsFromString() with --trace_gc to show the current memory usage (red) and execution time (purple) of each garbage collection (the X axis shows total elapsed time in seconds):

'use strict'

var fs = require('fs');
var v8 = require('v8');
var wstream = fs.createWriteStream('myOutput.txt');

v8.setFlagsFromString('--trace_gc');

for (var i = 0; i < 10000000000; i++) {
    wstream.write(i+'\n');
}

console.log('End!')
wstream.end();

Memory usage reaches 80% after about 4 seconds, and the garbage collector gives up trying to Scavenge and is forced to use Mark-sweep (more than 10 times slower) – see this article for more details.

For comparison, here are the same visualisations for @MikeC's code which waits for drain when the write buffer becomes full:

score 3 · Answer 3 · answered Mar 29 '19 at 22:30

To supplement (even more) @Mike Cluck's answer I implemented solution with the same behaviour using node stream pipe(). Maybe it will be useful for somebody. According to docs (Node 11.13.0):

The readable.pipe() method attaches a Writable stream to the readable, causing it to switch automatically into flowing mode and push all of its data to the attached Writable. The flow of data will be automatically managed so that the destination Writable stream is not overwhelmed by a faster Readable stream.

So, pipe() provides backpressure strategy out of the box. All what is needed is to create Readable stream somehow. In my example I'm extending Readable class from node stream module to create simple counter:

const { Readable } = require('stream');
const fs = require('fs');
const writeStream = fs.createWriteStream('./bigFile.txt');

class Counter extends Readable {
    constructor(opt) {
        super(opt);
        this._max = 1e7;
        this._index = 1;
    }

    _read() {
        const i = this._index++;
        if (i > this._max)
            this.push(null);
        else {
            this.push(i + '\n');
        }
    }
}

new Counter().pipe(writeStream);

The behaviour is exactly the same - data is push to file constantly in small chunks and memory consumption is constant (on my machine ~50MB).

The great thing about pipe() is that if you have provided readable stream (from request i.e.) all what you need to do is use: readable.pipe(writable).

Thanks for taking the time to submit an example implementing a Readable. Many times the default Readable objects just don't cut it. — Govind Rai, Apr 02 '21 at 23:51

Node: fs write() doesn't write inside loop. Why not?

3 Answers3

Linked