JavaScript - reading a whitespace delimited text file into array and use as lookup table

Question

First off: I'm an absolute beginner in JavaScript and started 2 weeks ago to learn many hours a day. I am running a node.JS server on GNU/Linux and I tried a lot of variations to achieve the goal. Unfortunately I stuck and don't know how to continue.

I have a text file with white-spaces and line feeds and the file contains something about > 2000 lines. I want to read this text file into my javascript program so I can use later as a lookup table. I am not sure if I need to JSON stringify it for later use, maybe it's simple to leave it as an object/array which I can make use for my lookup function later. I want to pull out of this text file only those lines containing the character "#" and use it as delimiter. All other lines can be ignored. Each line is representing one data set, element, object or whatever it's called correctly. The final goal is: user asks for "Apple" and he should get "-9.99" and "BTW" (for example) as answer. Here's an example of the raw text file:

 Sugar#    1051#      331#     BAD#     1.23#    -4.56#    -5.0#  WWF#
 N3T;
 Apple#     551#     3815#     F3W#     5.55#    -9.99#    -1.0#  BTW#
 BBC;
 Berry#      19#       22#      FF#     19.5#   -12.34#     5.0#  CYA#
 T1K;

It should represent 3 elements each of them containing 8 pairs:

 name: 'Sugar'
 sec: 1051
 ter: 331
 wrd: 'BAD'
 a: 1.23
 b: -4.56
 c: -5.0
 spon: 'WWF'

 name: 'Apple'
 sec: 551
 ter: 3815
 wrd: 'F3W'
 a: 5.55
 b: -9.99
 c: -1.0
 spon: 'BTW'

 name: 'Berry'
 sec: 19
 ter: 22
 wrd: 'FF'
 a: 19.5
 b: -12.34
 c: 5.0
 spon: 'CYA'

At the beginning I tried using fs.readFileSync to read the whole text file as a string but without success. Disappointed I tried another approach with readline to read my text file line-by-line and do the filtering because I gained the impression on the net that this method is more memory-friendly and allows reading even very large files. Although I'm pretty sure 3000 lines are a joke figure :)

This was my code when approaching with readline:

const fs = require('fs');
const readline = require('readline');

function readAndFilter (source, data) {
 var fields;
 var obj = new Object;
 var arr = new Array;

const readAndFilter = readline.createInterface({
 input: fs.createReadStream('test.in'),
 crlfDelay: Infinity
 });

 readAndFilter.on('line', (line) => {
     if ( line.match( /#/ ) ) {
      fields        = line.split( '#' ).slice();
      obj.name      = fields[0].trim();
      obj.sec       = fields[1].trim();
      obj.ter       = fields[2].trim();
      obj.wrd       = fields[3].trim();
      obj.a         = fields[4].trim();
      obj.b         = fields[5].trim();
      obj.c         = fields[6].trim();
      obj.spon      = fields[7].trim();

     console.log(obj);
     // let jsonView = JSON.stringify(obj);
     // arr.push(obj);
     }
   });

  readAndFilter.on('close', function() {
   return arr;
  });

}

readAndFilter();

This is what the code outputs (note that I customized my console log by adding a timestamp for each line output):

 2019-06-16 14:40:10 { name: 'Sugar',
 sec: '1051',
 ter: '331',
 wrd: 'BAD',
 a: '1.23',
 b: '-4.56',
 c: '-5.0',
 spon: 'WWF' }
 2019-06-16 14:40:10 { name: 'Apple',
 sec: '551',
 ter: '3815',
 wrd: 'F3W',
 a: '5.55',
 b: '-9.99',
 c: '-1.0',
 spon: 'BTW' }
 2019-06-16 14:40:10 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' }

the data fields look fine, the file was processed correctly so far but => the object "obj" will hold only the last data set (name:Berry) because it is rewritten after each line-by-line. I double-checked by cutting the line

console.log(obj);

from the readAndFilter.on('line', ... block and insert it into the 'close' block:

[...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();
              obj.name      = fields[0].trim();
              obj.sec       = fields[1].trim();
              obj.ter       = fields[2].trim();
              obj.wrd       = fields[3].trim();
              obj.a = fields[4].trim();
              obj.b = fields[5].trim();
              obj.c = fields[6].trim();
              obj.spon      = fields[7].trim();

            // let jsonView = JSON.stringify(obj);
            // arr.push(obj);
            }
      });

      readAndFilter.on('close', function() {
       console.log(obj);
      return arr;
      });
    [...]

the output produced is:

 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' }

that won't work as a lookup table, I need all the lines in an array so I can access them later for the lookup routine. So I tried to add each object into one array with following code:

    [...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();
              obj.name      = fields[0].trim();
              obj.sec       = fields[1].trim();
              obj.ter       = fields[2].trim();
              obj.wrd       = fields[3].trim();
              obj.a = fields[4].trim();
              obj.b = fields[5].trim();
              obj.c = fields[6].trim();
              obj.spon      = fields[7].trim();

            // let jsonView = JSON.stringify(obj);
            arr.push(obj);
            }
      });

      readAndFilter.on('close', function() {
       console.log(arr);
      return arr;
      });
    [...]

now I get an array with three objects, but only the last dataset name:Berry again is shown

 [ { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' },
 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' },
 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' } ]

I even tried with concat and many other variations. What the hell am I doing wrong? Is my approach using the readline/line-by-line technique completely wrong, should I use fs.readFileSync instead? I also tried it, here's my approach with fs.readFileSync:

            function readAndFilter () {
                var fields;
                var obj = new Object;
                var arr = new Array;
                var data = fs.readFileSync('test.in', 'utf8').replace(/\r\n/g,'\n').split('\n').filter(/./.test, /\#/)
    /*
            if ( data.match( /#/ ) ) {
                fields      = data.split( '#' ).slice();
                obj.name    = fields[0].trim();
                obj.cqz     = fields[1].trim();
                obj.itu     = fields[2].trim();
                obj.cont    = fields[3].trim();
                obj.lng     = fields[4].trim();
                obj.lat     = fields[5].trim();
                obj.tz      = fields[6].trim();
                obj.pfx     = fields[7].trim();
            };
    */
    console.log(typeof data + "\n" + data);
    }

The variable data is typeof object as soon as I start to use .split('\n') and thus I cannot make use of my following if-clause. It fails because it would only work on a string. Maybe I am pointing completely to the wrong direction and it's much simpler? The final goal is: I want to check a search string like "Apple" against this lookup table and retrieve the appropriate values (name, sec, ter, b, or any of them).

I am really thankful to any helpful answer or hint. Please be patient with me and honestly said: I really tried a lot! Thanks to all.

score 1 · Answer 1 · answered Jun 16 '19 at 14:46

First off, welcome to SO, and compliments on your focused and elaborate question. Good job!

The reason why your stream solution doesn't work as intended is because it's asynchronous, so you're trying to access the result before it's actually there. Check out our classic thread to learn more about this.

For the sake of simplicity, however, I'd suggest to stick with the readFileSync solution. Generally speaking, sync functions are not recommended in node.js for performance reasons, but given that the file is tiny (3000 lines), it shouldn't hurt much.

Once you've read the file, the parsing could be done like this:

let text = fs.readFileSync('test.in', 'utf8');

let result = [];

for (let line of text.trim().split('\n')) {

    if (!line.includes('#'))
        continue;

    let s = line.trim().split(/[#\s]+/g);

    result.push({
        name: s[0],
        sec: s[1],
        ter: s[2],
        wrd: s[3],
        a: s[4],
        b: s[5],
        c: s[6],
        spon: s[7],
    });
}


console.log(result)

I think OP needs to end up with a `result` object which maps names to entries, not just an array of entries. — rici, Jun 16 '19 at 16:22

score 0 · Answer 2 · answered Jun 16 '19 at 19:37

Hello George and many thanks so far. I did only cross-read the link you posted but will dive into later. Without the intention of anticipating, I don't think my code failed because I am trying to access the result before it's there as you said. In the readline variant I posted you see that I tried the push function to add the new objects into the array which I defined in the beginning.

I was curious after reading your code and tried it. I am not interested in a ready-to-use code which I have no clue what it does, but I really like to understand what's going on behind the scenes and how everything works. That's why I am still asking, my goal is to understand. So in my humble opinion you did quite the same stuff what I already tried before, the only difference is that your array push command looks different than mine. I used

arr.push(obj);

which obviously failed. As explained before I used following code for the readline variant:

 [...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();
              obj.name      = fields[0].trim();
              obj.sec       = fields[1].trim();
              obj.ter       = fields[2].trim();
              obj.wrd       = fields[3].trim();
              obj.a = fields[4].trim();
              obj.b = fields[5].trim();
              obj.c = fields[6].trim();
              obj.spon      = fields[7].trim();

            arr.push(obj);
            }
      });

      readAndFilter.on('close', function() {
       console.log(arr);
      return arr;
      });
    [...]

so I just changed/removed the mentioned line "arr.push(obj)" and replaced the push function to look equivalent to yours:

 [...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();

            arr.push({
              name: fields[0].trim(),
              sec: fields[1].trim(),
              ter: fields[2].trim(),
              wrd: fields[3].trim(),
              a: fields[4].trim(),
              b: fields[5].trim(),
              c: fields[6].trim(),
              spon: fields[7].trim(),
            });
            }
      });

      readAndFilter.on('close', function() {
       console.log(arr);
      return arr;
      });
    [...]

this way it outputs the same result as your code, WORKS!!!* As I am using readline and thus line by line is processed, it does not need a for-loop. Was it really this single line that made me sick and caused the trouble? On the other side I am asking myself how it's possible to "beautify" the code to make it more simple, so I don't need to write each name,sec,ter,wrd,a,b,c,spon column. Imagine one has 150 properties per each object, that would be a pain in the ass to write it down. That's why I initially tried a simple arr.push(obj), sadly it didn't work as I expected.

Any helpful explanation appreciated. Thank you again! now I need to find a way to read/search through the lookup table which is held in memory so I can display/output the appropriate keypair/value I need to.

Hi, yes, this is a problem in your original code: you were overwriting the same `obj` over and over again, while `arr.push({...})` adds a new object to the result, as expected. However, your readline solution still doesn't work as you seem to expect, particularly, `return array` in the `close` handler does nothing (return to whom?) I guess the sync option would be easier to manage for the time being. — georg, Jun 16 '19 at 20:23
That said, this is not how this site works. SO is not a forum, it would be more comfortable to stick to the Q&A format: you ask a question - someone else answers - you post a short comment asking for clarifications - if you have more questions, your post a new question and so on. — georg, Jun 16 '19 at 20:29
Hi Georg. Yes, I understand how it works. For that reason I got into the point and asked back about the **arr.push(obj);** part because I wanted to have it clarified and understood. Unfortunately it's still unanswered how one can achieve the explained goal in case he has much much more properties inside the object. It would be awful to have them written down in the push-section. Is there any shortcut or better/simpler code for getting the same result? about the "return" part: ok, I can remove the line from the close section but the code runs fine on my side with the expected results. Hmm? — user882786, Jun 17 '19 at 06:30
yes, the streams code runs and logs the desired array, however, when you add more code to actually process it, you'll encounter issues. For example, where exactly to put that processing code? Regarding the "big objects" problem, I'd post a new question about it specifically. — georg, Jun 17 '19 at 07:42

JavaScript - reading a whitespace delimited text file into array and use as lookup table

2 Answers2