2

I need to process a DOM and remove all hyperlinks to a particular site while retaining the underlying text. Thus, something ling <a href="abc.com">text</a> changes into text. Taking cue from this thread, I wrote this:

$as = $dom->getElementsByTagName('a');
for ($i = 0; $i < $as->length; $i++) {
    $node = $as->item($i);
    $link_href = $node->getAttribute('href');
    if (strpos($link_href,'offendinglink.com') !== false) {
        $cl = $node->getAttribute('class');
        $text = new DomText($node->nodeValue);
        $node->parentNode->insertBefore($text, $node);
        $node->parentNode->removeChild($node);
        $i--;
    }
}

This works fine except that I also need to retain the class attributed to the offending <a> tag and maybe turn it into a <div> or a <span>. Thus, I need this:

<a href="www.offendinglink.com" target="_blank" class="nice" id="nicer">text</a>

to turn into this:

<div class="nice">text</div>

How do I access the new element after it's been added (like in my code snippet)?

Community
  • 1
  • 1
TheLearner
  • 2,813
  • 5
  • 46
  • 94

2 Answers2

1

Tested solution:

<?php
$str = "<b>Dummy</b> <a href='http://google.com' target='_blank' class='nice' id='nicer'>Google.com</a> <a href='http://yandex.ru' target='_blank' class='nice' id='nicer'>Yandex.ru</a>";
$doc = new DOMDocument();
$doc->loadHTML($str);
$anchors = $doc->getElementsByTagName('a');
$l = $anchors->length;
for ($i = 0; $i < $l; $i++) {
    $anchor = $anchors->item(0);
    $link = $doc->createElement('div', $anchor->nodeValue);
    $link->setAttribute('class', $anchor->getAttribute('class'));
    $anchor->parentNode->replaceChild($link, $anchor);
}
echo preg_replace(['/^\<\!DOCTYPE.*?<html><body>/si', '!</body></html>$!si'], '', $doc->saveHTML());

Or see runnable.

userlond
  • 3,632
  • 2
  • 36
  • 53
1

quote "How do I access the new element after it's been added (like in my code snippet)?" - your element is in $text i think.. anyway, i think this should work, if you need to save the class and the textContent, but nothing else

foreach($dom->getElementsByTagName('a') as $url){
    if(parse_url($url->getAttribute("href"),PHP_URL_HOST)!=='badsite.com')    {
        continue;
    }
    $ele = $dom->createElement("div");
    $ele->textContent = $url->textContent;

    $ele->setAttribute("class",$url->getAttribute("class"));
    $url->parentNode->insertBefore($ele,$url);
    $url->parentNode->removeChild($url);
}
Matthew Wilcoxson
  • 3,432
  • 1
  • 43
  • 48
hanshenrik
  • 19,904
  • 4
  • 43
  • 89