If you follow into the jQuery code, internally, in the jQuery object constructor, once it determines that you've passed in an HTML string, then it calls jQuery.parseHTML()
on that string. If you follow into the parseHTML()
method, if the HTML is not a single tag only, then it then calls buildFragment()
on the same HTML string and if you follow into it you will find that it discards the <body>
tag. I don't know why it does that, but that's the way it is coded to behave.
So, there's this type of code flow:
jQuery object constructor
determine if argument is an HTML string
call jQuery.parseHTML() on the HTML string
if string is not a single tag by itself,
then call jQuery.buildFragment() on the string
jQuery.buildFragment() seems to ignore the outer tag container
I have not been able to figure out why buildFragment()
ignores the outer <body>other content here</body>
, but it does.
On further study of buildFragment()
, it correctly parses the outer tag as <body>
, but as long as that tag isn't a tag type that needs some special treatment (such as the kinds of things that can only exist inside of tables), it completely ignores what type that outer tag was and forces it to be a <div>
. That outer container is then ignored later, when the content is retrieved from the jQuery object. Again, I'm not sure why it does that, but that is what it does.
As for your particular problem, I think the conclusion is that you can't use jQuery's constructor to handle an entire HTML document. It just isn't built to do that.
You could search the HTML document that was given to you and extract just the part between <body>
and </body>
, give that to the jQuery object constructor, do your manipulations on it, then put the manipulated HTML back into the original whole HTML document between the original <body>
and </body>
tags, thus preserving everything that you didn't want to manipulate while using jQuery for the part internal to the <body>
tag.
You should probably also be wary of <script>
elements in the <body>
tag as they probably aren't preserved perfectly either.
test
"; div=document.createElement('div'); div.innerHTML = text; console.log(div.innerHTML); ` and all of them returns "test
" (doctype, html, head and body striped) – some Aug 03 '14 at 00:21test
"; html=document.createElement('html'); html.innerHTML = text; console.log(html.outerHTML); ` :) You have to handle the doctype separately. You could also test for head and body and use innerHTML to populate them. – some Aug 03 '14 at 00:32