0

I have this file and I don't know how to parse the text like that:

File: https://gist.githubusercontent.com/zach-karat/119d690176f324e3f99c0e312f0a6620/raw/82e14d739e966216536ae9806282a20343e0e2f8/google-books-common-words.txt

[1]
Word => 'THE' 
USED => 53097401461

[2]
Word => 'OF'
USED => 30966074232

And then I have to search the TOP Xs words in use. (X is a parameter)

This is my JavaScript:

    $.get("https://gist.githubusercontent.com/zach-karat/119d690176f324e3f99c0e312f0a6620/raw/82e14d739e966216536ae9806282a20343e0e2f8/google-books-common-words.txt", function(data, status){
      // Thats works once at the time but with letters and not with numbers!
      //var hasString = data.includes("HELLO");
      var content = data;
      $('#content').html(data.replace('\n','<br>'));
    });
  });
}, 'html');

EDIT:

The words on the file are sorted so I edited my code to this:(NOW...It is posible to know the TOP10 words in use that has 3 words of length?)

    $.get("https://gist.githubusercontent.com/zach-karat/119d690176f324e3f99c0e312f0a6620/raw/82e14d739e966216536ae9806282a20343e0e2f8/google-books-common-words.txt", function(data, status,){
      var lines = data.split("\n");
      var x = 0;
      $.each(lines, function(n, elem) {
        // append if lenght > 10
        $('#content').append('<div>' + elem + '</div>');
        x ++;
        if(x == 10){//x => parameter
          return false;
        }
      });                
    });
  });    
}, 'html');
Rick
  • 4,030
  • 9
  • 24
  • 35

4 Answers4

1

Use a regex to split each line.

Regex: /^([A-Z]+)\s*(\d+)$/gm

Explanation:

^ - Start of the string

([A-Z]+) - Remember the match of characters A-Z.

\s* - 1 or more spaces

(\d+) - Remember the match of digits 0-9.

gm - global and multiline flags

Example: Regex101

$.get("https://gist.githubusercontent.com/zach-karat/119d690176f324e3f99c0e312f0a6620/raw/82e14d739e966216536ae9806282a20343e0e2f8/google-books-common-words.txt", function(data, status){           
       var regexp = /^([A-Z]+)\s*(\d+)$/gm;
       var html = "";
       var content = regexp.exec(data);
       while(content)
       {
          html += "WORD : "+content[1]+"<br>USED : "+content[2]+"<br><br>";
          content = regexp.exec(data);
       }
       $('#content').html(html);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

<div id="content"></div>
Vignesh Raja
  • 7,927
  • 1
  • 33
  • 42
1

Does this do what you need?

$.get("https://gist.githubusercontent.com/zach-karat/119d690176f324e3f99c0e312f0a6620/raw/82e14d739e966216536ae9806282a20343e0e2f8/google-books-common-words.txt", function (data, status) {
    var content = data.split('\n').map(function(row){
        return row.split('\t')
    })

    var x = 10; //from input parameter
    var topResults = content.slice(0, x);
    var html = topResults.map(function(result){
        return result[0] + '\t' + result[1] + '<br>'
    })
    $('#content').html(html);
}, 'text')

No jQuery needed for the actual work.

Partik
  • 808
  • 9
  • 15
  • @PabloMalynovytch Yes, `.replace()` only replaces the first occurrence if you give it a string, but using regex with a global flag replaces all. Your comment was a reference to my answer before my major edit, but with the new edit, you don't even need regex. – Partik Jul 16 '18 at 19:34
0

A concise solution with Array.prototype.reduce:

$.get("https://gist.githubusercontent.com/zach-karat/119d690176f324e3f99c0e312f0a6620/raw/82e14d739e966216536ae9806282a20343e0e2f8/google-books-common-words.txt", function(data) {

  var html = data.split('\n').slice(0,10).reduce((all, item) => {

    var [word, count] = item.split('\t');

    return `${all}<div>Word:${word}, Used: ${count}</div>`;

  }, '');

  $('#content').html(html);

});
Leonid Pyrlia
  • 1,594
  • 2
  • 11
  • 14
0
$.get('https://gist.githubusercontent.com/zach-karat/119d690176f324e3f99c0e312f0a6620/raw/82e14d739e966216536ae9806282a20343e0e2f8/google-books-common-words.txt', function (data) {

   const result = data.split('\n').reduce((res, curr) => { 
   const tmp = curr.split(' '); 

   return {...res, ...{[tmp[0]]: tmp[1]}};
  }, {});
})

so the result will look like:

const result = {
      THE: "53097401461",
      OF: "30966074232",
      AND: "22632024504",
      TO: "19347398077",
      IN: "16891065263",
      A: "15310087895",
      IS: "8384246685",
      THAT: "8000768228",
      FOR: "6545282031",
      IT: "5740085369",
      AS: "5700645258",
      WAS: "5502713968",
      WITH: "5182797249",
      BE: "4818864785",
      BY: "4703106084",
      ON: "4594521081",
      NOT: "4522732626",
      HE: "4110457083",
      I: "3884828634",
      THIS: "3826060334"
    };

result['THE'] = 53097401461;

Hope it will help.