How to filter content from DOMDocument output?

Question

The following piece of code outputs a parsed html:

$domd = new DOMDocument('5.0', 'utf-8');
libxml_use_internal_errors(true);
$domd->loadHTML(mb_convert_encoding(($postDetails['content']), 'HTML-ENTITIES', 'UTF-8'));
libxml_clear_errors();
echo $domd->saveHTML();

However it, outputs extra tags like <html>, <head>, etc. I want to only get the content inside the body tag. How do I achieve that?

For eg: if the <body> tag contains <p> or other tags that may contain content, I need to display those as it is.

@Jack The answer in that question is much cleaner. Thanks for pointing that out. However it doesn't show a way to get rid of the body tags. — maxxon15, Dec 04 '14 at 13:58
The first answer in there mentioned the body tag in the commented part of their code. — Ja͢ck, Dec 04 '14 at 14:02
@Jack Yeah, I saw. But it doesn't apply in my case here. :/ I do have multiple nodes and it doesn't show anything to handle something like that. — maxxon15, Dec 04 '14 at 14:06

score 5 · Answer 1 · answered Dec 04 '14 at 13:41

5

saveHTML() supports an optional param $node. Using it you are allowed to specify a node in the Document which should be exported instead of the whole document. If you want to export the <body> only, then use:

echo $domd->saveHTML($domd->getElementsByTagName('body')->item(0));

answered Dec 04 '14 at 13:41

hek2mgl

152,036
28
249
266

Just found this in an another answer. But that one doesn't have a clean way to remove the body tag – maxxon15 Dec 04 '14 at 13:50
1

@maxxon15 Oh, I though that you want to body tag in the results. This should help you: http://stackoverflow.com/a/2087136/171318 – hek2mgl Dec 04 '14 at 13:52

How to filter content from DOMDocument output?

1 Answers1