2

This answer Read a file one line at a time in node.js? shows how to read a file line by line.

var lineReader = require('readline').createInterface({
  input: require('fs').createReadStream('file.in')
});

lineReader.on('line', function (line) {
  console.log('Line from file:', line);
});
lineReader.on('close', function (line) {
  console.log('Finished');
});

But if I make that callback an async function so that I can do something like validate and transform each line and write it into a different file or to an API, then it doesn't work. The 'close' event gets fired without waiting for the individual lines to finish their async functions.

Is there a way to process a file line by line asynchronously using readline or any libraries built in to Node.js?

What's the simplest way to get this to work?

I need to do it line by line because the files are very large and memory would be completely consumed otherwise.

user779159
  • 9,034
  • 14
  • 59
  • 89

2 Answers2

2

As explained in this answer, a stream should be promisified and transformed to async iterator in order to be efficiently iterated with promises and async..await. This can be achieved with third-party p-event library:

try {
    const lineReader = require('readline').createInterface({
      input: require('fs').createReadStream('file.in')
    });

    const asyncIterator = pEvent.iterator(lineReader, 'line', {
      resolutionEvents: ['close']
    });

    for await (const line of asyncIterator) {
      console.log('Line from file:', line);
    }
} catch(e) {
    console.log(e);
} finally {
  console.log('Finished');
}
Estus Flask
  • 206,104
  • 70
  • 425
  • 565
  • Does `pEvent.iterator` support backpressure (for when you `await` something in that loop)? – Bergi Oct 21 '18 at 11:54
  • Sort of. It doesn't specifically support streams, just eagerly collects data from listener params and creates an iterator to iterate over it. It does roughly what the code you posted, `validate(line).then(transform)`. – Estus Flask Oct 21 '18 at 12:18
  • So it *doesn't* support backpressure, ok. – Bergi Oct 21 '18 at 13:23
  • 1
    The [current docs](https://nodejs.org/api/readline.html#readline_example_read_file_stream_line_by_line) seem to indicate that `p-event` is not necessary to support `for await (...)` syntax. – Mmmh mmh Mar 19 '21 at 15:28
0

Don't use an async function as a callback, the event emitter does not care about the returned promise. Instead, build a promise chain yourself:

var lineReader = require('readline').createInterface({
  input: require('fs').createReadStream('file.in')
});
var done = Promise.resolve();

lineReader.on('line', function (line) {
  console.log('Line from file:', line);
  var promise = validate(line).then(transform); // or whatever;
  done = Promise.all([
    done
    promise
  ]).then(() => void 0); // needed to drop result data
  // alternatively also do things that must happen sequentially
});
lineReader.on('close', function (line) {
  console.log('Finished reading');
  done.then(() => console.log('Finished transforming'));
});
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • I'm trying to understand this approach. Would it work for files with millions of lines or would that be millions of the `done` promises in memory? – user779159 Oct 21 '18 at 13:11
  • @user779159 There is only a single `done` variable, and its value would reference only the not yet settled promises. The number of promises in memory would be linear to the number of asynchronous operations that are running concurrently. If you need backpressure (not read lines faster than you can process them) the code will need to be more elaborate. – Bergi Oct 21 '18 at 13:22