2

I am having trouble converting a string containing an entire HTML document into a DOM object where I can use jQuery find() . The input starts with the DOCTYPE string, and is in variable data.

This works and finds a dozen tr elements:

var dummy0 = $(data).find('tr');

This does not work, although there definitely is a h1 element:

var dummy1 = $(data).find('h1');

If I do this to inspect the data object:

var dummy2 = $(data);

in the Firefox F12 Debugger it appears that dummy2 is an array of the following objects:

#text
title
#text
h1
#text
table
#text

So find('tr') worked because it is found within the table array element, but find('h1') is not found because it is not inside a DOM but one of the array elements of data .

I tried the trick of https://stackoverflow.com/a/11047751/1845672 but that results in exactly the same array instead of a single DOM tree.

I tried also $.parseHTML(data) with the same result.

Can anyone help me explain how this all works? The input is a string with exactly ONE html element but is parsed to an array of a bunch of elements. Where are the head and body elements?

Then, because I need the content of the h1, how do I get the DOM object that can be searched with find for all elements including h1?

Or am I forced to forget about DOM trees and just inspect the array element for h1?


Update:

I created a small stand-alone test case:

<!DOCTYPE html>
<html>
<head>
    <title>Test</title>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
</head>
<body>
    <h1>Test</h1>
    <script>
        var sHtml = "<!DOCTYPE html>\n<html><head><title>test</title></head>\n<body><h1>Test</h1><table><tr><td>Abc</td></tr></table></body></html>";
        var dHtml = $(sHtml);
        var h1 = dHtml.find('h1');
        var td = dHtml.find('td');
        alert('h1: ' + (h1.length == 0 ? 'not found' : h1.text()) + //output: h1: not found
            ' td: ' + (td.length == 0 ? 'not found' : td.text()) +  //output: td: Abc
            ' dHtml.length: ' + dHtml.length);                      //output: dHtml.length: 5
    </script>
</body>
</html>

It appears that the two #text entries in the array value of dHtml correspond to the two newlines, one after the DocType, the other after the head tag. Still wondering why there is not one, but three DOM entries in dHtml.

Roland
  • 4,619
  • 7
  • 49
  • 81
  • 1
    jQuery has problems parsing such complete documents. Can you strip the input HTML down to only the body contents first, and then parse only that? Otherwise, I would recommend that you use a proper parser to begin with, instead of letting jQuery do this part; https://stackoverflow.com/q/10585029/1427878 – CBroe Apr 03 '18 at 11:03
  • @CBroe Your link made me find the solution. – Roland Apr 03 '18 at 12:14

1 Answers1

0

Apparently, I didn't try hard enough with the trick in the link mentioned in the question, and the input is not too complex for the little jQuery parser, because it all works nicely with this modified line:

var dHtml = $('<div></div>').html(sHtml);

This puts the entire head and body parts in the single top-level div. No problem with the DOCTYPE line in the input string. jQuery find() works to find everything.

Roland
  • 4,619
  • 7
  • 49
  • 81