0

I would like to sanitize a HTML document (created in google docs) so I can publish it on my CMS.

I have the source document in a string, from to , with header, style, body etc. I would like to extract the body content and replace/eliminate a few tags. If I could do this using jQuery I think it would be easier than with more sophisticated html parsers.

But when I try to get the body of the document, I don't get usable results. I tried:

var gdoc = "<html>...google document...</html>"
$(gdoc) //list of text nodes, can not rebuild to document or find body
$("body",gdoc) //empty list

Is this doable or am i going completely wrong about this? Any tips / references you could share?

Julio Faerman
  • 13,228
  • 9
  • 57
  • 75

2 Answers2

1

Try like this:

var gdoc = '<html><body><div id="foo">Bar</div></body></html>';
var data = $('<div/>').html(gdoc).find('#foo').html();
alert(data);

Demo.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • This is what i am trying to do, but there seems to be something special with the body tag. Using your answer i can get elements from the inner html, but if i want the whole body content, i get null when using "$('
    ').html(gdoc).find('body').html();"
    – Julio Faerman Jul 19 '11 at 17:36
0

I believe you can do what you're trying to do, but you're wording it improperly. You can grab the HTML from another document and manipulate it, but you can't manipulate the external document persay. You can grab it using

$.get("url", function() {
  //modify stuff here
});
switz
  • 24,384
  • 25
  • 76
  • 101