24

I'm making an ajax call to fetch content and append this content like this:

$(function(){
    var site = $('input').val();
    $.get('file.php', { site:site }, function(data){
        mas = $(data).find('a');
        mas.map(function(elem, index) {
            divs = $(this).html();
            $('#result').append('' + divs + '');
        })
    }, 'html');
});

The problem is that when I change a in body I get nothing (no error, just no html). Im assuming body is a tag just like 'a' is? What am I doing wrong?

So this works for me:

 mas = $(data).find('a');

But this doesn't:

 mas = $(data).find('body');
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Youss
  • 4,196
  • 12
  • 55
  • 109
  • Please add a sample response you're getting from querying file.php – Rafael Jan 20 '13 at 09:36
  • @Rafael You mean my console log? – Youss Jan 20 '13 at 09:37
  • It can be `console.log(data)` or anything that shows the complete string you received with the ajax call. – Rafael Jan 20 '13 at 09:39
  • 1
    I just checked, with simplified code, and different pages, and can confirm I am experiencing the same issue. It works to select elements within the `body` but not to select the `body` itself. – Billy Moon Jan 20 '13 at 09:42
  • @Rafael Im not sure but I think it has to be an url (fom input.val) This could be any url. – Youss Jan 20 '13 at 09:46
  • yes, so please add the sample response to your post. Boaz's tip might help in your case (when `body` is the first/root tag in your response), but to be sure, we need to know how the reponse looks like. – Rafael Jan 20 '13 at 09:47
  • @Rafael maybe this: http://mywebsite.com/file.php?site=http%3A%2F%2Fnu.nl Im not sure how this can help.. – Youss Jan 20 '13 at 09:51

5 Answers5

13

I ended up with this simple solution:

var body = data.substring(data.indexOf("<body>")+6,data.indexOf("</body>"));
$('body').html(body);

Works also with head or any other tag.

(A solution with xml parsing would be nicer but with an invalid XML response you have to do some "string parsing".)

Yush0
  • 1,547
  • 18
  • 22
12

Parsing the returned HTML through a jQuery object (i.e $(data)) in order to get the body tag is doomed to fail, I'm afraid.

The reason is that the returned data is a string (try console.log(typeof(data))). Now, according to the jQuery documentation, when creating a jQuery object from a string containing complex HTML markup, tags such as body are likely to get stripped. This happens since in order to create the object, the HTML markup is actually inserted into the DOM which cannot allow such additional tags.

Relevant quote from the documentation:

If a string is passed as the parameter to $(), jQuery examines the string to see if it looks like HTML.

[...] If the HTML is more complex than a single tag without attributes, as it is in the above example, the actual creation of the elements is handled by the browser's innerHTML mechanism. In most cases, jQuery creates a new element and sets the innerHTML property of the element to the HTML snippet that was passed in. When the parameter has a single tag (with optional closing tag or quick-closing) — $( "< img / >" ) or $( "< img >" ), $( "< a >< /a >" ) or $( "< a >" ) — jQuery creates the element using the native JavaScript createElement() function.

When passing in complex HTML, some browsers may not generate a DOM that exactly replicates the HTML source provided. As mentioned, jQuery uses the browser"s .innerHTML property to parse the passed HTML and insert it into the current document. During this process, some browsers filter out certain elements such as < html >, < title >, or < head > elements. As a result, the elements inserted may not be representative of the original string passed.

Boaz
  • 19,892
  • 8
  • 62
  • 70
  • If you find a relevant workaround, post it as an answer as well. – Boaz Jan 21 '13 at 11:58
  • 1
    I disagree that it's doomed to fail! The solution that I've posted to this answer works perfectly and is as convenient as anything else in jquery. – Gershom Maes Oct 23 '14 at 23:46
  • @GershomMaes The issue raised by the OP is about *directly* parsing the returned HTML string. Your solution, while being a neat trick, works around this issue by *indirectly* parsing the HTML string as an XML document first. This does not negate the fact that directly parsing the HTML strips the `body` tag. – Boaz Oct 24 '14 at 20:45
6

I experimented a little, and have identified the cause to a point, so pending a real answer which I would be interested in, here is a hack to help understand the issue

$.get('/',function(d){
    // replace the `HTML` tags with `NOTHTML` tags
    // and the `BODY` tags with `NOTBODY` tags
    d = d.replace(/(<\/?)html( .+?)?>/gi,'$1NOTHTML$2>',d)
    d = d.replace(/(<\/?)body( .+?)?>/gi,'$1NOTBODY$2>',d)
    // select the `notbody` tag and log for testing
    console.log($(d).find('notbody').html())
})

Edit: further experimentation

It seems it is possible if you load the content into an iframe, then you can access the frame content through some dom object hierarchy...

// get a page using AJAX
$.get('/',function(d){

    // create a temporary `iframe`, make it hidden, and attach to the DOM
    var frame = $('<iframe id="frame" src="/" style="display: none;"></iframe>').appendTo('body')

    // check that the frame has loaded content
    $(frame).load(function(){

        // grab the HTML from the body, using the raw DOM node (frame[0])
        // and more specifically, it's `contentDocument` property
        var html = $('body',frame[0].contentDocument).html()

        // check the HTML
        console.log(html)

        // remove the temporary iframe
        $("#frame").remove()

    })
})

Edit: more research

It seems that contentDocument is the standards compliant way to get hold of the window.document element of an iFrame, but of course IE don't really care for standards, so this is how to get a reference to the iFrame's window.document.body object in a cross platform way...

var iframeDoc = iframe.contentDocument || iframe.contentWindow.document;
var iframeBody = iframeDoc.body;
// or for extra caution, to support even more obsolete browsers
// var iframeBody = iframeDoc.getElementsByTagName("body")[0]

See: contentDocument for an iframe

Community
  • 1
  • 1
Billy Moon
  • 57,113
  • 24
  • 136
  • 237
  • additionally, it does not seem to make any diference what syntax you use for the selector, as it seems to be a restriction in the jQuery core, so `$('body',d)` has the same results as `$(d).find('body')`. – Billy Moon Jan 20 '13 at 10:15
  • Hi, thanks for sticking around. However I want to use my code for any given website, as we know some websites do not support iframes.. – Youss Jan 20 '13 at 10:43
  • Maybe it doesnt work in 'jquery environment' and I would have to result to plain javascript. I have been trying variations with `document.getElementsByTagName("body")[0];` with no luck so far – Youss Jan 20 '13 at 10:44
  • I think the problem is, that you can't add another `HTML`, `HEAD` or `BODY` to the DOM. If you try to set the `.innerHTML` of a `DIV` tag to include any of these forbidden elements, it simply won't add them - which is why I expect jQuery is not able to then select them. – Billy Moon Jan 20 '13 at 10:55
  • @Youss could you explain to me what websites don't support iframes? I had thought they were pretty much universally supported these days. – Billy Moon Jan 20 '13 at 11:48
  • I mean the embedding of website in iframe(google), but maybe I misunderstood your answer.. – Youss Jan 20 '13 at 12:14
  • @Youss my answer uses a hidden iframe, only to temporarily load content, so it can be accessed and manipulated as a cohesive document. It allows you to create the HTML, HEAD and BODY DOM elements. None of it is shown on the screen, and it is destroyed as soon as the DOM access and manipulation is finished, and the result stored in a variable. The existing page does not need to be changed, so the only issue with iFrames is if they are not supported at all. I am genuinely interested in a good solution for this problem, as it seems like a very basic and useful function. Keep this thread going! – Billy Moon Jan 20 '13 at 19:34
  • Hi, I found an answer at: http://stackoverflow.com/questions/7839889/trying-to-select-a-body-tag-from-html-that-is-returned-by-get-request But I can't seem to implement in my code...I would appreciate if you could help me out – Youss Jan 21 '13 at 11:08
  • @Youss it seems to be pretty much the same solution as my first proposal, using regex to strip out the offending tags, which is not ideal, because regular expressions are not able to handle fringe cases, and malformed HTML - a parser would be required for that. Most front-end parsers are based on using the browser's built in parser, which is the problem, because it won't let you add already existing BODY tags. This is what led me to my second proposal, which is to load the HTML into an iFrame, allowing the BODY tag to be added. This might not be ideal, because it depends on iFrames. – Billy Moon Jan 21 '13 at 17:30
4

I FIGURED OUT SOMETHING WONDERFUL (I think!)

Got your html as a string?

var results = //probably an ajax response

Here's a jquery object that will work exactly like the elements currently attached to the DOM:

var superConvenient = $($.parseXML(response)).children('html');

Nothing will be stripped from superConvenient! You can do stuff like superConvenient.find('body') or even

superConvenient.find('head > script');

superConvenient works exactly like the jquery elements everyone is used to!!!!

NOTE

In this case the string results needs to be valid XML because it is fed to JQuery's parseXML method. A common feature of an HTML response may be a <!DOCTYPE> tag, which would invalidate the document in this sense. <!DOCTYPE> tags may need to be stripped before using this approach! Also watch out for features such as <!--[if IE 8]>...<![endif]-->, tags without closing tags, e.g.:

<ul>
    <li>content...
    <li>content...
    <li>content...
</ul>

... and any other features of HTML that will be interpreted leniently by browsers, but will crash the XML parser.

Gershom Maes
  • 7,358
  • 2
  • 35
  • 55
  • Great! I'm glad that anyone's getting some use out of this since I was personally browbeaten by the time I stumbled across this solution :) – Gershom Maes Oct 20 '14 at 03:44
  • +1 Though there's an obvious overhead, since the HTML string is being parsed twice, instead of once. With large HTML documents this might be costly. – Boaz Oct 24 '14 at 20:51
  • 1
    The jQuery XML parser says the html starting with ' – Ziad Feb 20 '15 at 16:12
  • I hadn't thought of that, and it certainly makes sense to me that the DOCTYPE could break the parser - although I would imagine that if you only take the component of the results including and beyond the " – Gershom Maes Feb 25 '15 at 20:44
  • 3
    I used the same code but got an error Uncaught Error: Invalid XML: – Jnanaranjan Sep 08 '15 at 11:27
  • Are you sure that the response you are parsing is syntactically valid? JQuery's xml parser won't be able to handle malformed html (or xml) – Gershom Maes Sep 08 '15 at 14:48
  • I also got this error at the tag, I don't understand why. However if there were an error for a
    I would understand...
    – Jérôme MEVEL Nov 10 '16 at 01:02
  • Hmm I would print out the html string you're working with. Then copy/paste it into an online xml validator - that should give good feedback as to where the xml syntax error is! – Gershom Maes Nov 10 '16 at 19:21
2

Regex solution that worked for me:

var head = res.match(/<head.*?>.*?<\/head.*?>/s);
var body = res.match(/<body.*?>.*?<\/body.*?>/s);

Detailed explanation: https://regex101.com/r/kFkNeI/1

Noel Schenk
  • 724
  • 1
  • 8
  • 19