1

I'm using Node.js to read and parse a file of pairs encoding numbers. I have a file like this:

1561 0506
1204 900
6060 44

And I want to read it as an array, like this:

[[1561,0506],[1204,900],[6060,44]]

For that, I am using a readStream, reading the file as chunks and using native string functions to do the parsing:

fileStream.on("data",function(chunk){
    var newLineIndex;
    file = file + chunk;
    while ((newLineIndex = file.indexOf("\n")) !== -1){
        var spaceIndex = file.indexOf(" ");
        edges.push([
            Number(file.slice(0,spaceIndex)),
            Number(file.slice(spaceIndex+1,newLineIndex))]);
        file = file.slice(newLineIndex+1);
    };
});

That took way to many time, though (4s for the file I need on my machine). I see some reasons:

  1. Use of strings;
  2. use of "Number";
  3. Dynamic array of arrays.

I've rewriten the algorithm without using the builtin string functions, but loops instead and, to my surprise, it became much slower! Is there any way to make it faster?

MaiaVictor
  • 51,090
  • 44
  • 144
  • 286

1 Answers1

0

Caveat: I have not tested the performance of this solution, but it's complete so should be easy to try.

How about using this liner implementation based on the notes in this question.

Using the liner:

var fs = require('fs')
var liner = require('./liner')

var source = fs.createReadStream('mypathhere')
source.pipe(liner)
liner.on('readable', function () {
     var line
     while (line = liner.read()) {
          var parts = line.split(" ");
          edges.push([Number(parts[0]), Number(parts[1])]);
     }
})

As you can see I also moved the edge array to be an inline constant-sized array separate from the split parts, which I'm guessing would speed up allocation. You could even try swapping out using indexOf(" ") instead of split(" ").

Beyond this you could instrument the code to identify any further bottlenecks.

Community
  • 1
  • 1
Will
  • 6,601
  • 3
  • 31
  • 42