1

I have tab separated file with several columns (usually 9). File can be several hundreds of megabytes in size, typically just under 1 Gb, which is thousands to millions of lines. Several lines (random number of lines) describe one particular thing. Each line will have some bits of information and I want to collate several lines of information into single object, because it'll be much easier to work with. Here is my initial attempt:

const fs = require('fs');
const events = require("events");
const readline = require('readline');

// get second argument on the command line
var myFile = process.argv[2];

const rl = readline.createInterface({
        input: fs.createReadStream(myFile)
});

myObject = {};

rl.on('line', (line) => {
        var items = line.split("\t");
        if(!(items[0] in myObject)) {
                myObject[items[0]]=items[3];   
        };
});

I learned how to read in large file and I'm sort of getting to understand node.js events thing, but my problem is that random number of lines are grouped together, but the don't have to be sequential lines in the file. So this is some sort of look ahead functionality, but again look ahead might need to be looking through a whole file, which isn't efficient I believe.

After reading node.js events post, very similar to my problem. I came up with this.

var myFile = process.argv[2];

var myEvent = new events.EventEmitter();

const rl = readline.createInterface({
        input: fs.createReadStream(myFile)
});

rl.on('line', function(line) {
        var items = line.split("\t");
        myObject = {
                id = items[0],
                name = items[2],
                other = items[7]
                     };   
         myEvent.emit('data', myObject);
});

myEvent.on('data', function(myObject) {
        console.log(myObject);
}

I think I'm beginning to understand how rl instance of readline class has events thing and .on event line you can get every line from a file. And you can then emit that newly made object onto further processing. By I can't figure out how to manipulate several lines, i.e how to store everything in a single global object.

p.s newbie at node.js and js in general, but really keen to take it up. Any general advice, links or any other help will be much appreciated.

Community
  • 1
  • 1
serine
  • 115
  • 2
  • 8
  • What exactly do you mean by "*my problem is that random number of lines are grouped together*"? Are you saying that `rl` does not emit lines in the same order they occur in the file? – Bergi Mar 16 '16 at 03:58
  • @Bergi Say I have 10 line in a file. First three describe one particular thing. I want to put those three lines into single array e.g `{UniqueId1: [line1, line2, line3]}`.. But lines don't have to be in order e.g `{UniqueId2: [line1, line9, line10]}`. I found [this blog on streams](http://nicolashery.com/parse-data-files-using-nodejs-streams/) I think this is exactly what I need to do, but still going through it and struggling a bit – serine Mar 16 '16 at 04:03
  • Your initial attempt should work. Apart from the `itmes` typo and the lack of an array being built, that is. But those don't have anything to do with streaming. – Bergi Mar 16 '16 at 04:09
  • @Bergi but it isn't working..? If add `console.log(myObject)` at the very end of the file to get `map` I get empty `map`. I can add `console.log(line)` inside the `.on` block and I will get the line, but not in the `myObject` – serine Mar 16 '16 at 04:14
  • You have to put the final `console.log(myObject)` in a callback for the end of the stream, so that it will be called after all lines have been processed by your `line` handler. Check the readline docs, it should have an event for that. – Bergi Mar 16 '16 at 04:18
  • @Bergi I have been through Realine, File System and Events docs a few times now, would you be able to point me to exact location in the docs..thanks – serine Mar 16 '16 at 04:22
  • Weird how this is not documented, but `rl` [will emit `end` and `close` events](https://github.com/maleck13/readline/blob/master/readline.js#L50) like most streams. – Bergi Mar 16 '16 at 04:26
  • @Bergi sorry I'm not getting it.. I added this to the end of my file `rl.on('end', function(myObject) {console.log(myObject); });` , but still get nothing. Can you maybe just add what you mean as an answer..? p.s I'm still getting my head around callbacks.. cheers – serine Mar 16 '16 at 04:40
  • Don't make `myObject` a parameter of that function, it shadows the one you declared. – Bergi Mar 16 '16 at 05:31
  • 1
    @Bergi Okay, great got it ! For others `'end'` event didn't work for some reason. This line at the end of the file did it `rl.on('close', function() {console.log(myObject)};` cheers – serine Mar 16 '16 at 23:56

0 Answers0