0

I have simple script to handle CSV file with size 10GB. The idea is pretty simple.

  1. Open file as stream.
  2. Parse CSV objects from it.
  3. Modify objects.
  4. Make output stream to new file.

I made following code, but it cause memory leak. I have tried a lot of different things, but nothing helps. The memory leak disappear if I remove transformer from pipes. Maybe it causes memory leak.

I run the code under NodeJS.

Can you help me found where I am wrong?

'use strict';

import fs from 'node:fs';
import {parse, transform, stringify} from 'csv';
import lineByLine from 'n-readlines';


// big input file
const inputFile = './input-data.csv';

// read headers first
const linesReader = new lineByLine(inputFile);
const firstLine = linesReader.next();
linesReader.close();
const headers = firstLine.toString()
    .split(',')
    .map(header => {
        return header
            .replace(/^"/, '')
            .replace(/"$/, '')
            .replace(/\s+/g, '_')
            .replace('(', '_')
            .replace(')', '_')
            .replace('.', '_')
            .replace(/_+$/, '');
    });

// file stream
const fileStream1 = fs.createReadStream(inputFile);

// parser stream
const parserStream1 = parse({delimiter: ',', cast: true, columns: headers, from_line: 1});

// transformer
const transformer = transform(function(record) {
    return Object.assign({}, record, {
        SomeField: 'BlaBlaBla',
    });
});

// stringifier stream
const stringifier = stringify({delimiter: ','});

console.log('Loading data...');

// chain of pipes
fileStream1.on('error', err => { console.log(err); })

    .pipe(parserStream1).on('error', err => {console.log(err); })

    .pipe(transformer).on('error', err => { console.log(err); })

    .pipe(stringifier).on('error', err => { console.log(err); })

    .pipe(fs.createWriteStream('./_data/new-data.csv')).on('error', err => { console.log(err); })

    .on('finish', () => {
        console.log('Loading data finished!');
    });
Jakeroid
  • 205
  • 2
  • 11
  • 2
    Memory leaks are quite complex and dependent on a number of factors. I'd suggest you do some analysis during run-time using the chrome-dev tools, which will help you identify the source. You can find more information in [this](https://stackoverflow.com/questions/15970525/how-to-find-js-memory-leaks) past stackoverflow answer. – Christos Binos Jul 18 '22 at 08:33
  • @ChristosBinos Thank you, but I am running my code under NodeJS. Could chrome help me? – Jakeroid Jul 18 '22 at 08:38
  • 1
    Yes. This [website](https://sematext.com/blog/nodejs-memory-leaks/) gives you several ways to debug memory leaks on nodejs, including using Chrome-dev tools – Christos Binos Jul 18 '22 at 08:44
  • 1
    I'm not sure this is a memory leak as such, rather a stream that is not writing and no back pressure mechanism in the transforms so the buffers keep filling. Replace the fs write stream with process.stdout and it works. – Matt Jul 18 '22 at 23:57
  • 1
    The memory is all record objects in a BufferList – Matt Jul 18 '22 at 23:57
  • @Matt Thanks. I will try to figure out about that. If I got your point correct my write stream doesn't consume records automatically, and transform stream doesn't push them to. So it looks like data just go to the memory without fill free buffer. Did I understand correctly? – Jakeroid Jul 19 '22 at 07:13

0 Answers0