DOMDocument and php html problems

Question

Alright. So I'm using DOMDocument to read html files. One thing I've noticed is that when I do this

$doc = new DOMDocument();
$doc->loadHTML($htmlstring);
$doc->saveHTML();

it will add on a doctype header, and html and body tags.

I've gotten around this by doing this

$doc = new DOMDocument();
$doc->loadXML($htmlstring,LIBXML_NOXMLDECL);
$doc->saveXML();

The problem with this however is the fact that now all my tags are case sensitive, and it gets mad if I have more than one document root.

Is there an alternative so that I can load up partial html files, grab tags and such, replace them, and get the string without having to parse the files manually?

Basically I want the functionallity of DOMDocument->loadHTML, without the added tags and header.

Any ideas?

score 2 · Answer 1 · edited May 23 '17 at 12:03

In theory you could tell libxml not to add the implied markup. In practise, PHP's libxml bindings do not provide any means to that. If you are on PHP 5.3.6+ pass the root node of your partial document to saveHTML()which will then give you the outerHTML of that element, e.g.

$dom->saveHTML($dom->getElementsByTagName('body')->item(0));

would only return the <body> element with children. See

How to return outer html of DOMDocument?

Also note that your partial document with multiple root elements only works because loadHTML adds the implied elements. If you want a partial with multiple roots (or rather no root at all) back, you can add a fake root yourself:

$dom->loadHTML('<div id="partialroot">' . $partialDoc . '</div>');

Then process the document as needed and then fetch the innerHTML of that fake root

How to get innerHTML of DOMNode?

Also see How do you parse and process HTML/XML in PHP? for additional parsers you might want to try

score 0 · Answer 2 · answered Sep 26 '11 at 06:44

0

You can use some divs with specific id, and then from the document object, partially extract the div object using its id.

answered Sep 26 '11 at 06:44

Kris

8,680
4
39
67

DOMDocument and php html problems

2 Answers2