37

I am trying to parse this html through jQuery to get data1, data2, data3. While I do get data2 and data3 I am unable to get data3 with my approach. I am fairly new to jQuery so please pardon my ignorance.

<html>
<body>
   <div class="class0">
    <h4>data1</h4>
    <p class="class1">data2</p>
    <div id="mydivid"><p>data3</p></div>    
   </div>
</body>
</html>

Here is how I am calling this in my jquery.

var datahtml = "<html><body><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></html>";

alert($(datahtml).find(".class0").text()); // Doesn't Work

alert($(datahtml).find(".class1").text()); // work 

alert($(datahtml).find("#mydivid").text()); // work

Only alert($(datahtml).find(".class0").text()); is not working the rest are working as expected. I am wondering it may be because class0 has multiple tag inside it or what?? How to get data1 in such scenario?

Fabrício Matté
  • 69,329
  • 26
  • 129
  • 166
lazyguy
  • 963
  • 1
  • 13
  • 33

7 Answers7

60

None of the current answers addressed the real issue, so I'll give it a go.

var datahtml = "<html><body><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></html>";

console.log($(datahtml));

$(datahtml) is a jQuery object containing only the div.class0 element, thus when you call .find on it, you're actually looking for descendants of div.class0 instead of the whole HTML document that you'd expect.

A quick solution is to wrap the parsed data in an element so the .find will work as intended:

var parsed = $('<div/>').append(datahtml);
console.log(parsed.find(".class0").text());

Fiddle


The reason for this isn't very simple, but I assume that as jQuery does "parsing" of more complex html strings by simply dropping your HTML string into a separate created-on-the-fly DOM fragment and then retrieves the parsed elements, this operation would most likely make the DOM parser ignore the html and body tags as they would be illegal in this case.

Here is a very small test suite which demonstrates that this behavior is consistent through jQuery 1.8.2 all the way down to 1.6.4.

Edit: quoting this post:

Problem is that jQuery creates a DIV and sets innerHTML and then takes DIV children, but since BODY and HEAD elements are not valid DIV childs, then those are not created by browser.

Makes me more confident that my theory is correct. I'll share it here, hopefully it makes some sense for you. Have the jQuery 1.8.2's uncompressed source side by side with this. The # indicates line numbers.

All document fragments made through jQuery.buildFragment (defined @#6122) will go through jQuery.clean (#6151) (even if it is a cached fragment, it already went through the jQuery.clean when it was created), and as the quoted text above implies, jQuery.clean (defined @#6275) creates a fresh div inside the safe fragment to serve as container for the parsed data - div element created at #6301-6303, childNodes retrieved at #6344, div removed at #6347 for cleaning up (plus #6359-6361 as bug fix), childNodes merged into the return array at #6351-6355 and returned at #6406.

Therefore, all methods that invoke jQuery.buildFragment, which include jQuery.parseHTML and jQuery.fn.domManip - among those are .append(), .after(), .before() which invoke the domManip jQuery object method, and the $(html) which is handled at jQuery.fn.init (defined @#97, handling of complex [more than a single tag] html strings @#125, invokes jQuery.parseHTML @#131).

It makes sense that virtually all jQuery HTML strings parsing (besides single tag html strings) is done using a div element as container, and html/body tags are not valid descendants of a div element so they are stripped out.


Addendum: Newer versions of jQuery (1.9+) have refactored the HTML parsing logic (for instance, the internal jQuery.clean method no longer exists), but the overall parsing logic remains the same.

Fabrício Matté
  • 69,329
  • 26
  • 129
  • 166
  • 1
    +1 indeed! `:)` I have recreated the issue and solution for the OP below. – Tats_innit Oct 09 '12 at 22:01
  • I just tried this approach but it is actually giving me "data1data2data3" as output. Try adding an alert like this alert(parsed.find(".class0").text()) How to output only data1 as the result? – lazyguy Oct 10 '12 at 03:59
  • 1
    Awesome Fabrício Matté. You finally made my day. I am so glad to see all the responses on this forum and I thank you everyone who contributed here. Really appreciate that! – lazyguy Oct 10 '12 at 06:21
  • You made my day, my friend! Work on legacy stuff is killing me. – Giacomo Cerquone Nov 20 '17 at 16:42
  • In my case find was stripping divs, took me a second to figure out what was going on as I wouldn't expect a find function to do any kind of manipulation by itself... but the append solution worked, thanks! – RandomUs1r Nov 21 '17 at 16:58
26

Its behaviour is weird as it igonores the html and body tag and start from first div with class = "class0". The html is parsed as DOM elements but not added to DOM. For elements added to DOM the selector does not ignore body tag and apply selectors on document. You need to add the html to DOM as given below.

Live Demo

$('#div1').append($(datahtml)); //Add in DOM before applying jquery methods.

alert($('#div1').find(".class0").text()); // Now it Works too

alert($('#div1').find(".class1").text()); // work   

alert($('#div1').find("#mydivid").text()); // work

If we wrap your html within some html element to make it starting point instead of your first div with class="class0" then your selector will work as expected.

Live Demo

var datahtml = "<html><body><div><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></div></body></html>";

alert($(datahtml).find(".class0").text()); // Now it Works too

alert($(datahtml).find(".class1").text()); // work   

alert($(datahtml).find("#mydivid").text()); // work

What jQuery docs say about the jQuery parsing function jQuery() i.e. $()

When passing in complex HTML, some browsers may not generate a DOM that exactly replicates the HTML source provided. As mentioned, jQuery uses the browser"s .innerHTML property to parse the passed HTML and insert it into the current document. During this process, some browsers filter out certain elements such as <html>, <title>, or <head> elements. As a result, the elements inserted may not be representative of the original string passed.

Fabrício Matté
  • 69,329
  • 26
  • 129
  • 166
Adil
  • 146,340
  • 25
  • 209
  • 204
  • I liked your way of explaining that "jquery is applied on DOM and you are trying to apply on string containing html, It would be more straight forward if you add the html in some DOM element and the apply jquery selectors" – lazyguy Oct 12 '12 at 15:57
  • -1 This is misleading in many ways. Read up the [`jQuery( html [, ownerDocument] )` documentation](http://api.jquery.com/jquery/): "Description: Creates DOM elements on the fly from the provided string of raw HTML." Hence `$(html)` does return a jQuery object with the parsed data and is ready to have jQuery methods applied on it. – Fabrício Matté Oct 13 '12 at 06:00
  • Not to be harsh but, you didn't only put an extremely similar solution than mine, you also gave it a completely improper and misleading description. This does not address the real issue(s) at any point. – Fabrício Matté Oct 13 '12 at 06:07
  • No doubt you explained it very nicely and mine was bit ambiguous. Thanks for pointing that. One thing I want to tell is that this is bit harsh for me "you didn't only put an extremely similar solution than mine". I found this in the link you gave a reference, "During this process, some browsers filter out certain elements such as , , or elements. As a result, the elements inserted may not be representative of the original string passed." This is the reason the same selector code works for given html if it is not in string but in page. – Adil Oct 13 '12 at 10:38
3

I think I have an even better way:

let's say you've got your html:

var htmlText = '<html><body><div class="class0"><h4>data1</h4><p class="class1">data2</p><div id="mydivid"><p>data3</p></div></div></body></html>'

Here's the thing you've been hoping to do:

var dataHtml = $($.parseXML(htmlText)).children('html');

dataHtml now works exactly like the ordinary jquery objects you're familiar with!!

The wonderful thing about this solution is that it will not strip body, head, or script tags!

Gershom Maes
  • 7,358
  • 2
  • 35
  • 55
2

Try this

alert($(datahtml).find(".class0 h4").text());

The reason being the text you are referring to is inside h4 element of class0 .. So your selector will not work,, Or access the contents directly..

alert($(".class0 h4").text()); 

alert($(".class1").text()); 

alert($("#mydivid").text()); 

EDIT

var datahtml = "<html><body><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></html>";

$('body').html(datahtml);

   alert($(".class0 h4").text()); 

    alert($(".class1").text()); 

    alert($("#mydivid").text()); 

CHECK DEMO

Sushanth --
  • 55,259
  • 9
  • 66
  • 105
  • Look's like you need to append the data to the html or body in the first place .. Otherwise it can't find the contents.. Check edited code – Sushanth -- Oct 09 '12 at 22:03
  • Sushanth: This approach did work and thanks for the solution but it is now appending the html to my page's body tag so my page is altered with the html. How to remove that side effect?? Please suggest... – lazyguy Oct 10 '12 at 05:34
  • If you already have the existing HTML on page .. Just comment the first two line before alert and it should be fine.. – Sushanth -- Oct 10 '12 at 05:36
  • $.ajax({ type: "GET", url: "index.html", error: function(xhRequest, errorText, thrownError) {alert(errorText)}, success: function(datahtml){ alert( "Success: "); alert($(".class0 h4").text()); alert($(".class1").text()); alert($("#mydivid").text()); }}); Here index.html has the same content as described in the problem. But when I do this the alert shows no data. – lazyguy Oct 10 '12 at 06:06
  • Thanks Sushanth for all your help. – lazyguy Oct 10 '12 at 06:38
  • I see that you are sending the ajax request to index.Html .. Ajax Request has to be sent to the server and not client side.. So obviously it will throw an error – Sushanth -- Oct 10 '12 at 06:40
1

I don't know any other way than placing the HTML in an temporary invisible container.

$(document).ready(function(){
  var datahtml = $("<html><body><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></html>".replace("\\", ""));
  var tempContainer = $('<div style="display:none;">'+ datahtml +'</div>');
  $('body').append(tempContainer);
  alert($(tempContainer).find('.class1').text());
  $(tempContainer).remove();                                                                                                                                                        
});
​

Here is a jsfiddle demo.

Sem
  • 4,477
  • 4
  • 33
  • 52
0

It doesn't work because the <div> with the class class0 doesn't have any text nodes as direct children. Add the class to the <h4> and it will work

danwellman
  • 9,068
  • 8
  • 60
  • 88
0

I think the main problem is that you cannot have an html to your jquery. In your case what happens to Jquery is that it tries to find the first html tag, That in your case is the div with class0.

Test this to see that I am right:

if($(datahtml).hasClass('class0'))
    alert('Yes you are right :-)');

So this means that you cannot add the html and or the body tag as a part to have a query within.

If you want to make it work just try to add this part of code:

<div>
    <div class="class0">
        <h4>data1</h4>
        <p class="class1">data2</p>
        <div id="mydivid"><p>data3</p></div>    
   </div>
</div>

So try this:

var datahtml = "<div><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></div>";

alert($(datahtml).find(".class0").text()); // work

alert($(datahtml).find(".class1").text()); // work 

alert($(datahtml).find("#mydivid").text()); // work
John Skoumbourdis
  • 3,041
  • 28
  • 34