9

I'm adding a #b hash to each link via the DOMDocument class.

        $dom = new DOMDocument();
        $dom->loadHTML($output);

        $a_tags = $dom->getElementsByTagName('a');

        foreach($a_tags as $a)
        {
            $value = $a->getAttribute('href');
            $a->setAttribute('href', $value . '#b');
        }

        return $dom->saveHTML();

That works fine, however the returned output includes a DOCTYPE declaration and a <head> and <body> tag. Any idea why that happens or how I can prevent that?

matt
  • 42,713
  • 103
  • 264
  • 397
  • possible duplicate of [PHP + DOMDocument: outerHTML for element?](http://stackoverflow.com/questions/5404941/php-domdocument-outerhtml-for-element) – hakre Jul 03 '13 at 05:00
  • Possible duplicate of [How to saveHTML of DOMDocument without HTML wrapper?](https://stackoverflow.com/questions/4879946/how-to-savehtml-of-domdocument-without-html-wrapper) – miken32 Nov 02 '18 at 02:05

5 Answers5

6

The real problem is the way the DOM is loaded. Use this instead:

$html->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

Please upvote the original answer here.

codewario
  • 19,553
  • 20
  • 90
  • 159
Tiago A.
  • 2,568
  • 1
  • 22
  • 27
5

That's what DOMDocument::saveHTML() generally does, yes : generate a full HTML Document, with the Doctype declaration, the <head> tag, ...

Two possible solutions :

  • If you are working with PHP >= 5.3, saveHTML() accepts one additional parameter that might help you
  • If you need your code to work with PHP < 5.3.6, you'll have to use some str_replace() or regex or whatever equivalent you can think of to remove the portions of HTML code you don't need.
    • For an example, see this note in the manual's users notes.
Pascal MARTIN
  • 395,085
  • 80
  • 655
  • 663
  • the second link works fine for me - preg_replace solution is the key! thank you! – matt Mar 26 '11 at 19:20
  • 2
    You're welcome :-) *(and the guys who post users notes on manual pages are more to be thanked than me, in this case ;-) )* – Pascal MARTIN Mar 26 '11 at 19:21
  • I used the first option as I am using PHP >= 5.3 and it worked great. `$doc->saveHTML(false);` – Ben Sinclair Oct 21 '13 at 07:29
  • @BenSinclair I am also using PHP >= 5.3 and `$doc->saveHTML(false)` throws the error `Warning: DOMDocument::saveHTML() expects parameter 1 to be DOMNode, boolean given` – Timo Huovinen Mar 16 '14 at 15:07
2

Adding $doc->saveHTML(false); will not work and it will return a error because it expects a node and not bool.

The solution I used:

return preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $doc->saveHTML()));

I`m using PHP >5.4

CGeorges
  • 488
  • 4
  • 16
0

I solved this problem by creating new DOMDocument and copying child nodes from original to new one.

function removeDocType($oldDom) {
  $node = $oldDom->documentElement->firstChild
  $dom = new DOMDocument();
  foreach ($node->childNodes as $child) {
    $dom->appendChild($doc->importNode($child, true));
  }
  return $dom->saveHTML();
}

So insted of using

return $dom->saveHTML();

I use:

return removeDocType($dom);
Sigismund
  • 1,053
  • 9
  • 21
0

I was in the case where I want the html wrapper but not the DOCTYPE, the solution was in line with Tiago A.:

// Avoid adding the DOCTYPE header    
$dom->loadHTML($bodyContent, LIBXML_HTML_NODEFDTD);

// Avoid adding the DOCTYPE header AND html/body wrapper
$dom->loadHTML($bodyContent, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);