DB
and sort-stream
are fine solutions, but DB might be an overkill and I think sort-stream
eventually just sorts the entire file in an in-memory array (on through
end callback), so I think performance will be roughly the same, comparing to the original solution.
(but I haven't ran any benchmarks, so I might be wrong).
So, just for the hack of it, I'll throw in another solution :)
EDIT:
I was curious to see how big a difference this will be, so I ran some benchmarks.
Results were surprising even to me, turns out sort -k3,3
solution is better by far, x10 times faster then the original solution (a simple array sort), while nedb
and sort-stream
solutions are at least x18 times slower than the original solution (i.e. at least x180 times slower than sort -k3,3
).
(See benchmark results below)
If on a *nix machine (Unix, Linux, Mac, ...) you can simply use
sort -k 3,3 yourInputFile > op_rev.txt
and let the OS do the sorting for you.
You'll probably get better performance, since sorting is done natively.
Or, if you want to process the sorted output in Node:
var util = require('util'),
spawn = require('child_process').spawn,
sort = spawn('sort', ['-k3,3', './test.tsv']);
sort.stdout.on('data', function (data) {
// process data
data.toString()
.split('\n')
.map(line => line.split("\t"))
.forEach(record => console.info(`Record: ${record}`));
});
sort.on('exit', function (code) {
if (code) {
// handle error
}
console.log('Done');
});
// optional
sort.stderr.on('data', function (data) {
// handle error...
console.log('stderr: ' + data);
});
Hope this helps :)
EDIT: Adding some benchmark details.
I was curious to see how big a difference this will be, so I ran some benchmarks.
Here are the results (running on a MacBook Pro):
sort1 uses a straightforward approach, sorting the records in an in-memory array
.
Avg time: 35.6s (baseline)
sort2 uses sort-stream
, as suggested by Joe Krill.
Avg time: 11.1m (about x18.7 times slower)
(I wonder why. I didn't dig in.)
sort3 uses nedb
, as suggested by Tamas Hegedus.
Time: about 16m (about x27 times slower)
sort4 only sorts by executing sort -k 3,3 input.txt > out4.txt
in a terminal
Avg time: 1.2s (about x30 times faster)
sort5 uses sort -k3,3
, and process the response sent to stdout
Avg time: 3.65s (about x9.7 times faster)