3

I am developing a page that reads the source of another page and I need to extract certain information out of that page. I currently have the project snagging the live source with the data however I cannot for the life of me figure out how to convert this string into a document.

My rationale for using a document is that I need to use getElementById etc to get the value of these items.

What have I tried?

  1. Assigning the HTML to an invisible div on my page. This kind of works though it doesn't render the entire HTML string and provides a "shorter" rendition of this page.

  2. Manually finding the substrings. As you can imagine this is a crappy way to do things and provides very unreliable results.

  3. DOM parser to convert the doc and then query it but that fails miserably.

Any assistance at all would be seriously appreciated.

pertinent code:

$.ajax({
  method: "GET",
  dataType: '',
  crossDomain: true,
  xhrFields: {
    withCredentials: true
  },
  success: function(res) {
    //shows the entire source just fine.
    console.log("Value of RES: " + res);
    bootbox.hideAll();
    //shows a "truncated" copy of the source
    alert(res);
    $("#hiddendiv").html(x);
    var name = document.findElementById("myitem");
    alert(name);
  },
Heretic Monkey
  • 11,687
  • 7
  • 53
  • 122
basic
  • 3,348
  • 3
  • 21
  • 36
  • Have you tried parsing your HTML string with Cheerio? – Patrick Hund Mar 31 '17 at 16:16
  • @PatrickHund I have not however the issue here is that this page is a standalone page that will be distributed locally. – basic Mar 31 '17 at 16:25
  • It was my understanding you have a string with HTML code that you want to parse and the run queries on to extract data. That is exactly what Cheerio is for – Patrick Hund Mar 31 '17 at 16:27
  • That is exactly what I need. I will look into Cheerio further to see if I can make that work. – basic Mar 31 '17 at 16:28
  • Well, the code you're showing wouldn't work because you never define `x`. You're also using `findElementById` which is not a function of `document`. – Heretic Monkey Mar 31 '17 at 17:49
  • @MikeMcCaughan I apologize, x is intended to be RES that was a typo. Same with getElementById. Any additional information to offer? Those were merely typos – basic Mar 31 '17 at 17:56

2 Answers2

0

Create a hidden IFRAME on your document. Then set the contents of that IFRAME to the HTML that you want to query. Target that IFRAME with your javascript when you do your querying. See How can I access iframe elements with Javascript? to understand how.

Community
  • 1
  • 1
Trevor
  • 124
  • 1
  • 3
  • 12
-1

Another (probably better) option, is to use jQuery. jQuery allows you to create HTML, manipulate it, and query against it in memory. Querying DOM elements in jQuery is even easier than it is in pure javascript. See: http://jquery.com/.

//Get a jQuery object representing your HTML
var $html = $( "<div><span id='label'></span></div>" );

//Query against it
var $label = $html.find( "#label" ); //finds the span with and id of 'label'
Trevor
  • 124
  • 1
  • 3
  • 12
  • The question is tagged `jquery`, they're using `$.ajax` and `.html()`. Pretty sure they know about jQuery. – Heretic Monkey Mar 31 '17 at 17:48
  • @MikeMcCaughan Right, I get that they were already using jQuery. What I'm pointing out is that they can use it further to solve their problem. Rather than trying to put the HTML into the document of the browser just so it can be queried, you can take the string HTML, convert it to a jQuery object, and then query it directly in memory. Better for performance, and no issues are introduced into the page as its being viewed in the browser. – Trevor Mar 31 '17 at 19:54