7

I am trying to remove certain links depending on their ID tag, but leave the content of the link. For example I want to turn

Some text goes <a href="http://www.domain.tdl/" id="remove">here</a>

to

Some text goes here

I have tried using the below.

$dom = new DOMDocument;
$dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));
$xp = new DOMXPath($dom);

foreach($xp->query('//a[contains(@id="remove")]') as $oldNode) {
$revised = strip_tags($oldNode);
}

$revised = mb_substr($dom->saveXML($xp->query('//body')->item(0)), 6, -7, "UTF-8");
echo $revised;

roughly taken from here but it just spits back the same content of $html.

Any idea's on how I would achieve this?

Community
  • 1
  • 1
Jack
  • 185
  • 4
  • 12
  • You are not modifying your document here, thats why it spits the same content. Example you provided calls `replaceChild` on DOM object, and you are just creating variable that you later overwrite with an output of `saveXML` – German Rumm Jan 13 '11 at 00:16
  • Good question, +1. See my answer of a single XPath expression solution that selects exactly the wanted nodes. :) – Dimitre Novatchev Jan 13 '11 at 13:54

3 Answers3

16

That's my function for that:

function DOMRemove(DOMNode $from) {
    $sibling = $from->firstChild;
    do {
        $next = $sibling->nextSibling;
        $from->parentNode->insertBefore($sibling, $from);
    } while ($sibling = $next);
    $from->parentNode->removeChild($from);    
}

So this:

$dom->loadHTML('Hello <a href="foo"><span>World</span></a>');
$a = $dom->getElementsByTagName('a')->item(0); // get first
DOMRemove($a);

Should give you:

Hello <span>World</span>

To get nodes with a specific ID, use XPath:

$xpath = new DOMXpath($dom);
$node = $xpath->query('//a[@id="something"]')->item(0); // get first
DOMRemove($node);
netcoder
  • 66,435
  • 19
  • 125
  • 142
  • I had a look at this code on another post you did but a)I am getting an error `Fatal error: Call to a member function insertBefore() on a non-object` and b) How would I adapt this to only remove the a elements with a specific ID? – Jack Jan 13 '11 at 01:24
  • @Jack: Sorry my bad, the function argument was meant to be `$from` and not `$node`. Fixed. Thanks for pointing that out. Also added an example for fetching a node with a specific `id`. – netcoder Jan 13 '11 at 03:13
  • Two questions; How would I output the revised data? And when I use the example that you gave for specific IDs I get the same error as earlier. – Jack Jan 13 '11 at 05:50
  • @Jack: Use [DOMDocument::saveHTML](http://php.net/domdocument.savehtml) for output. For the error, did you update the code? This works okay for me now. – netcoder Jan 13 '11 at 13:44
  • You were right. The new code was wording. I was requesting something that did not exist. After a little bit of tweaking it does the required job. Thank you so much! – Jack Jan 14 '11 at 11:36
  • Please help in this question http://stackoverflow.com/questions/24713728/domdocument-and-delete-parent-tag?noredirect=1#comment38329661_24713728 – user1954544 Jul 12 '14 at 14:17
  • You need to check if sibling exists or it will produce error on elements with no content like: (which are sometimes created by html editors) function DOMRemove(DOMNode $from) { $sibling = $from->firstChild; if ($sibling) { do { $next = $sibling->nextSibling; $from->parentNode->insertBefore($sibling, $from); } while ($sibling = $next); } $from->parentNode->removeChild($from); } – Peter Apr 02 '19 at 09:54
2

An approach similar to @netcoder's answer but using a different loop structure and DOMElement methods.

$html = '<html><body>This <a href="http://www.domain.tdl/" id="remove">link</a> was removed.</body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a[@id="remove"]') as $link) {
  // Move all link tag content to its parent node just before it.
  while($link->hasChildNodes()) {
    $child = $link->removeChild($link->firstChild);
    $link->parentNode->insertBefore($child, $link);
  }
  // Remove the link tag.
  $link->parentNode->removeChild($link);
}
$html = $dom->saveXML();
recidive
  • 388
  • 3
  • 7
  • Can `$child = $link->removeChild($link->firstChild);` simply be written as `$child = $link->firstChild;`? – myol Nov 14 '16 at 12:41
1

Use:

 //a[@id='remove']/node() 
| 
 //*[a[@id='remove']]/node()[not(self::a[@id=''remove])]

This selects all children of any a having attribute id with value "remove" and all preceding and following siblings of this a that are not themselves another a having attribute id with value of "remove"

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431