0

I have a file, and I want to read it line by line, and for every line extracted I perform some expensive analyzes and then save the results to the database. In short, I have something like this:

const fs = require('fs');
const path = require('path');
const readline = require('readline');

async function analyzeAndSave(url) {
  // Removed for brevity, but this function takes a minute or so finsh.
}

async function run() {
  try {
    const dataPath  = path.join(path.dirname(require.main.filename), 'data/urls.txt');

    const rl = readline.createInterface({
      input: fs.createReadStream(dataPath),
    });

    let line_no = 0;
    rl.on('line', async (url) => {
      line_no++;
      logger.info(`Analyzing: ${url}`);
      await analyzeAndSave(url);
    });
  } catch (err) {
    // Error caught.
    logger.error(err);
  }
}

run();

The problem with this is that, I notice that it doesn't wait for the analyzes of one line to finish, it kind of tries to execute multiple of the analyzes instances. I can see this as initially it prints all the lines with logger.info('Analyzing: ' + url);`. So, it is not executed sequentially. How can I make sure that one line finishes before moving onto the next?

  • Couldn't you read all the lines into an array and analyze them one by one? The reason it won't be waiting is that await only blocks that function, it's not a chain of promises that is being read, you are just awaiting inside your event handler, how would the reader know that you want to wait for it? – Icepickle Mar 24 '19 at 15:54
  • Is there a reason why you are currently using the `readline` import as opposed to just reading a file in one go? – Icepickle Mar 24 '19 at 15:59
  • @Icepickle I just found that it is easier to read line by line with it. But, any solution to store stuff inside an array is fine with me also. –  Mar 24 '19 at 16:00

2 Answers2

1

I think this is going to be helpful to you, exampled and mentioned here.

Nodejs - read line by line from file, perform async action for each line and reusme

Someone stated you can use a library for big files which is titled: line-by-line

@JavierFerrero stated a solution as such.

var LineByLineReader = require('line-by-line'),
    lr = new LineByLineReader('big_file.txt');

lr.on('error', function (err) {
    // 'err' contains error object
});

lr.on('line', function (line) {
    // pause emitting of lines...
    lr.pause();

    // ...do your asynchronous line processing..
    setTimeout(function () {

        // ...and continue emitting lines.
        lr.resume();
    }, 100);
});

lr.on('end', function () {
    // All lines are read, file is closed now.
});

You can also pass it ass a callback, waiting for the operation to finish.

const fs = require('fs');

function run(path, cb) {
    try {
        fs.readFile(path, 'utf8', function(err, data){
            if(err) throw err;
            cb(data);
        });
    } catch (err) {
        // Error caught.
    }
}

run('./test.txt', (response) => {
    // We are done, now continue
    console.log(response)
})
ABC
  • 2,068
  • 1
  • 10
  • 21
1

The readline interface is emitting the "on" events asynchronously and doing an await inside one of them doesn't stop other from being fired. Instead you can buffer up the lines in an array like this:

r.on('line', url => urls.push(url));
r.on('close', async () => {
  for (const url of urls) {
    await analyzeAndSave(url);
  }
});

where urls is initialized to an empty array before the readline interface is created.

Always Learning
  • 5,510
  • 2
  • 17
  • 34
  • This looks like it's truly based on what the OP wants, however, I guess there needs to be a mention where the `urls` variable should be and maybe if the readline interface makes sense for reading files, if you would anyhow split them in one big array in the end – Icepickle Mar 24 '19 at 16:06