I'm using node v0.12.7 and want to stream directly from a database to the client (for file download). However, I am noticing a large memory footprint (and possible memory leak) when using streams.
With express, I create an endpoint that simply pipes a readable stream to the response as follows:
app.post('/query/stream', function(req, res) {
res.setHeader('Content-Type', 'application/octet-stream');
res.setHeader('Content-Disposition', 'attachment; filename="blah.txt"');
//...retrieve stream from somewhere...
// stream is a readable stream in object mode
stream
.pipe(json_to_csv_transform_stream) // I've removed this and see the same behavior
.pipe(res);
});
In production, the readable stream
retrieves data from a database. The amount of data is quite large (1M+ rows). I swapped out this readable stream with a dummy stream (see code below) to simplify debugging and am noticing the same behavior: my memory usage jumps up by ~200M each time. Sometimes the garbage collection will kick in and the memory drops down a bit, but it linearly rises until my server runs out of memory.
The reason I started using streams was to not have to load large amounts of data into memory. Is this behavior expected?
I also notice that, while streaming, my CPU usage jumps to 100% and blocks (which means other requests can't be processed).
Am I using this incorrectly?
Dummy readable stream code
// Setup a custom readable
var Readable = require('stream').Readable;
function Counter(opt) {
Readable.call(this, opt);
this._max = 1000000; // Maximum number of records to generate
this._index = 1;
}
require('util').inherits(Counter, Readable);
// Override internal read
// Send dummy objects until max is reached
Counter.prototype._read = function() {
var i = this._index++;
if (i > this._max) {
this.push(null);
}
else {
this.push({
foo: i,
bar: i * 10,
hey: 'dfjasiooas' + i,
dude: 'd9h9adn-09asd-09nas-0da' + i
});
}
};
// Create the readable stream
var counter = new Counter({objectMode: true});
//...return it to calling endpoint handler...
Update
Just a small update, I never found the cause. My initial solution was to use cluster to spawn off new processes so that other requests could still be handled.
I've since updated to node v4. While cpu/mem usage is still high during processing, it seems to have fixed the leak (meaning mem usage goes back down).