8

I'm trying to edit html tags with DOMDocument::loadHTML in php. The html data is a part of html and not the whole page. I followed what this page (PHP - DOMDocument - need to change/replace an existing HTML tag w/ a new one) says.

This should convert pre tags into div tags but it gives "Fatal error: Uncaught exception 'DOMException' with message 'Not Found Error'."

<?php
$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;

$dom = new DOMDocument;
@$dom->loadHTML($contents);

foreach( $dom->getElementsByTagName("pre") as $nodePre ) {
    $nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
    $dom->replaceChild($nodeDiv, $nodePre);
}

echo $dom->saveHTML();
?>

[Edit] While I'm trying to iterate the node object backwards, I get this error, 'Notice: Trying to get property of non-object...'

<?php
$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;

$dom = new DOMDocument;
@$dom->loadHTML($contents);
$domPre = $dom->getElementsByTagName('pre');
$length = $domPre->length;

    For ($i = $length; $i > -1 ; $i--) {
        $nodePre = $domPre->item($i);
        echo $nodePre->nodeValue . '<br />';
//      $nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
//      $dom->replaceChild($nodeDiv, $nodePre);
    }

    // echo $dom->saveHTML();
?>

[Edit] Okey, solved. Since the answered code has some error I post the solution here. Thanks all.

Solution:

<?php
$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;

$dom = new DOMDocument;
@$dom->loadHTML($contents);
$domPre = $dom->getElementsByTagName('pre');
$length = $domPre->length;

For ($i = $length - 1; $i > -1 ; $i--) {
    $nodePre = $domPre->item($i);
    $nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
    $nodePre->parentNode->replaceChild($nodeDiv, $nodePre);
}

echo $dom->saveHTML();
?>
Community
  • 1
  • 1
Teno
  • 2,582
  • 4
  • 35
  • 57
  • See [this answer](http://stackoverflow.com/a/5284835/1233508). – DCoder Aug 18 '12 at 13:13
  • I see so it is a problem of PHP. What about cloing the node and edit the cloned one? Is that slow compared to regex solutions? – Teno Aug 18 '12 at 14:20
  • It's not a problem of PHP. If you iterate over the NodeList backwards, you should be able to replace all the `pre` tags. If that doesn't work, change the logic to a less efficient version, replace first match, call `getElementsByTagName` again, replace first match... – DCoder Aug 18 '12 at 14:27
  • Iterating backwars is a nice idea. I'll give it a try. – Teno Aug 18 '12 at 14:30
  • I updated the initial post. I got another error while trying to do your suggestion. – Teno Aug 18 '12 at 15:15
  • [`$length`: The number of nodes in the list. The range of valid child node indices is 0 to ***length - 1*** inclusive.](http://us.php.net/domNodeList) – DCoder Aug 18 '12 at 15:41
  • I'm trying to get what you mean. I think I've already understood it but does that explain why the second example code in the updated initial post causes the error? – Teno Aug 18 '12 at 23:26
  • Ah, `For ($i = $length; $i > -1 ; $i--)` had to be For `($i = $length -1 ; $i > -1 ; $i--)` – Teno Aug 18 '12 at 23:38

2 Answers2

16

The problem is the call to replaceChild(). Rather than

$dom->replaceChild($nodeDiv, $nodePre);

use

$nodePre->parentNode->replaceChild($nodeDiv, $nodePre);

update

Here is a working code. Seems there is some issue with replacing multiple nodes (more info here: http://php.net/manual/en/domnode.replacechild.php) so you'll have to use a regressive loop to replace the elements.

$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;

$dom = new DOMDocument;
@$dom->loadHTML($contents);

$elements = $dom->getElementsByTagName("pre");
for ($i = $elements->length - 1; $i >= 0; $i --) {
    $nodePre = $elements->item($i);
    $nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
    $nodePre->parentNode->replaceChild($nodeDiv, $nodePre);
}
Czar Pino
  • 6,258
  • 6
  • 35
  • 60
  • `
    hi
    hello
    bye
    ` This is what I get by trying your suggestion. the second tag remains.
    – Teno Aug 18 '12 at 14:19
  • @Teno This problem seems to arise from trying to replace multiple nodes. There is a good advice from the PHP manual on how to get around this. Check my updated answer above. – Czar Pino Aug 18 '12 at 15:44
  • The line `$nodePre = $elements->item($i);` causes the error, "Notice: Undefined variable: i". In spite of fixing the simple error by defining $i, it starts to say "Notice: Trying to get property of non-object." – Teno Aug 18 '12 at 23:22
  • @Teno kindly check again. that was supposedly a for loop. A well meaning editor changed it but did not do it thoroughly. – Czar Pino Aug 19 '12 at 00:15
3

Another way with paquettg/php-html-parser (didn't find the way to change name, so had to use hack with re-binding $this):

use PHPHtmlParser\Dom;
use PHPHtmlParser\Dom\HtmlNode;

$dom = new Dom;
$dom->load($text);
/** @var HtmlNode[] $tags */
foreach($dom->find('pre') as $tag) {
    $changeTag = function() {
        $this->name = 'div';
    };
    $changeTag->call($tag->tag);
};
echo (string)$dom;
Slava V
  • 16,686
  • 14
  • 60
  • 63