1

I've run into a memory leak in a project and I've managed to recreate the problem in a simple example:

const { Readable } = require("stream");

for (let i=0; i<10000000; i++) {
    const r = new Readable();
    r.push(Buffer.from("a"));

    if (i % 10000 === 0) {
        const memory = process.memoryUsage();
        console.log(memory.heapUsed, memory.heapTotal);
    }
}

Run with node --max-old-space-size=1024 test.js so it doesn't eat all your rams and demonstrates the crash consistently.

This is leaking memory every iteration, but I've no idea why. Destroying the stream seemingly does nothing. I'm not storing any references to the data, so GC should pick up on that and clean up after each loop, but it isn't?

Node 12.18.3


Update: This is the area in my actual project that has the issue. My project is a parser for replay files from an RTS game, intended to extract meaningful data from the replays which I'm then storing in a database. It's not time critical, but I would like it to be reasonably fast so it doesn't get backed up. There's been 1 or 2 problem replays which are much longer games and contain a ton more data, that's when I noticed my parser has this memory problem.

Using the expose-gc flag and calling global.gc() doesn't seem to help, but making it async and awaiting a small delay of ~10ms every now and then completely sorts out the issue without holding it up too much, but it feels like a bad solution.

const { Readable } = require("stream");

(async () => {
    for (let i=0; i<10000000; i++) {
        const r = new Readable();
        r.push(Buffer.from("a"));
        r.push(null);
    
        if (i % 10000 === 0) {
            const memory = process.memoryUsage();
            console.log(memory.heapUsed, memory.heapTotal);

            // global.gc(); // no luck

            await delay(10);
        }
    }
})();

function delay(ms) {
    return new Promise(resolve => {
        setTimeout(() => resolve(), ms);
    });
}
Jazcash
  • 3,145
  • 2
  • 31
  • 46
  • GC generally doesn't run until your JS gets back to the event loop and has some free cycles where GC can do its job. If you're in a tight loop, creating millions of objects, those won't get GCed until the loop is done. If you create too many in the loop, kaboom you run out of memory. There is no GC in each loop iteration. Things are eligible for GC, but won't be actually freed until the interpreter finds time to run the GC process. FYI, this isn't a leak. This is high peak memory usage caused by the way your wrote your code. – jfriend00 Feb 08 '21 at 22:38
  • @jfriend00 Then is there a way I can manually call for GC or give node breathing room to run it automatically? – Jazcash Feb 08 '21 at 22:51
  • 1
    There is. It requires a command line argument. See [How to request the garbage collector in nodejs to run](https://stackoverflow.com/questions/27321997/how-to-request-the-garbage-collector-in-node-js-to-run). Running the GC manually is generally not recommended as there are usually just better ways to design your code that lower the peak memory usage and thus don't require it. You haven't shown your real code so we can't make suggestions on that. I'd also suggest you read [this article](https://strongloop.com/strongblog/node-js-performance-garbage-collection/) to learn more about GC. – jfriend00 Feb 08 '21 at 22:55
  • Thanks, I'll do some reading and play with that flag. In short, my project is reading a binary file and splitting it into chunks, each of which I process with a Readable stream in a while loop until they've all been read and meaningful data is extracted. The actual part causing the problem is essentially exactly the same as the sample code in my question – Jazcash Feb 08 '21 at 23:05
  • Well, if you showed us the real code, then we could offer some ideas on how to reduce the peak memory usage. Or, you could write a new question on that topic such as "How to reduce peak memory usage in this loop?". There's nothing asynchronous in this loop which is probably not how the real world code is which makes this example less interesting to work on. – jfriend00 Feb 08 '21 at 23:07
  • Updated the question with some additional info, and my temporary hacky solution that confirms you were right about giving GC time to collect. – Jazcash Feb 08 '21 at 23:23
  • You might want to study your code and see if the code is making multiple copies of the data you are processing. Javascript can sometimes be difficult to do memory efficient string processing because every single manipulation of a string results in a new string object being created because strings themselves are immutable which can lead to high peak memory usage if you are doing lots of string processing. I've had this problem myself when trying to do parsing of large sets of data where the peak memory use was 10-50x the size of the data being worked on. – jfriend00 Feb 09 '21 at 00:07
  • 1
    For example, just taking a large string and calling `.split()` on it will at least double and maybe triple the memory used by that data because not only do you make a new copy of each piece of data in the `.split()` results, but there's some overhead for each new object it creates. None of it leaks permanently (as it all will get GCed), it's just higher peak memory usage. Then, start doing parsing of each result in the `.split()` array and you can double or triple the usage again really easily. – jfriend00 Feb 09 '21 at 00:10

1 Answers1

0

I'm guessing your issue is related to this bug/behavior: https://github.com/nodejs/node/issues/38300

It has been fixed in node v16.1.0 and v14.17.1

joe
  • 3,752
  • 1
  • 32
  • 41