I would like to sanitize a HTML document (created in google docs) so I can publish it on my CMS.
I have the source document in a string, from to , with header, style, body etc. I would like to extract the body content and replace/eliminate a few tags. If I could do this using jQuery I think it would be easier than with more sophisticated html parsers.
But when I try to get the body of the document, I don't get usable results. I tried:
var gdoc = "<html>...google document...</html>"
$(gdoc) //list of text nodes, can not rebuild to document or find body
$("body",gdoc) //empty list
Is this doable or am i going completely wrong about this? Any tips / references you could share?