0

I have to get out information from a HTML table from a website. I want to do a HTML request from a Node.ja server to that website and parse the HTML table. Are there any libraries or techniques for JS except regular expression to parse the data from the table cells?

Sorry I'm very new in programming.

user2535056
  • 4,003
  • 5
  • 17
  • 13

4 Answers4

1

Look at the excellent Cheerio library:

https://github.com/MatthewMueller/cheerio

Examples are on the Git.

Deathspike
  • 8,582
  • 6
  • 44
  • 82
0
var doc = document.implementation.createDocument(null, your_downloaded_html_page_as_string, null);

You can use normal DOM function like getElementByTagName,firstChild,..etc to get your actual data from the HTML page you downloaded.

Refer Parse a HTML String with JS for more methods.

Community
  • 1
  • 1
Arunprasad Rajkumar
  • 1,374
  • 1
  • 15
  • 31
0

jsdom is a great module for this

// Count all of the links from the Node.js build page
var jsdom = require("jsdom");

jsdom.env(
  "http://nodejs.org/dist/",
  ["http://code.jquery.com/jquery.js"],
  function (errors, window) {
    console.log("there have been", window.$("a").length, "nodejs releases!");
  }
);
ehsangh
  • 311
  • 2
  • 6
  • 16
-1

I would use JQuery. You could iterate through all table datas like so: (this will alert the html inside every table data)

$('td').each( function () { alert( $(this).html() } );

or for a specific table:

$('#specific_table_id.td').each( function () { alert( $(this).html() } );
Thumbz
  • 347
  • 3
  • 11
  • There are methods for loading jQuery on node.js, but most rely on DOM emulation and are not always compatible with all jQuery plugins. – Timothy Meade Jul 07 '13 at 03:26