Remove links where href is disallowed

Question

I have some links like this:

<a href="http://illegallink.com"><img src="something.jpg" /><a href="http://legallink.com">legal</a></a>

I want to remove all links that does not have "legallink.com" in it. But still keep the content. So the above input would output:

<img src="something.jpg" /><a href="http://legallink.com">legal</a>

It should work recursively through the links.

I found this regex that removes all links: /<\\/?a(\\s+.*?>|>)/, but I want it to keep links where href is legallink.com.

Can this be done with regex? Or should I use a DOM parser?

A DOM parser is needed, especially for nested tags as in your example. — Bergi, Apr 18 '12 at 18:36
Can anyone give an example how I can achieve what I want? I have looked a lot, but can't find a solution. — Trolley, Apr 18 '12 at 19:08
Maybe here: http://stackoverflow.com/questions/4330545/php-html-dom-parser? — Taz, Apr 18 '12 at 19:49
This related question might be useful: [How can I change the name of an element in DOM?](http://stackoverflow.com/q/775904/367456) — hakre, May 13 '12 at 16:11

score 1 · Answer 1 · edited May 13 '12 at 16:12

1

error_reporting(~0); display_errors(1);

$code = '<a href="http://illegallink.com"><img src="something.jpg" /><a href="http://legallink.com">legal</a></a>';

$document = new DOMDocument(); 
$document->loadHTML($code); 
$parser = new DOMXPath($document);  

foreach($parser->query("//a") as $node)  
{ 
  if (!preg_match("/^http:\/\/legallink.com/i", $node->getAttribute("href")))
  {
    $node->parentNode->replaceChild($node->nodeValue, $node);
  }
}
echo $document->saveXML();

edited May 13 '12 at 16:12

hakre

193,403
52
435
836

answered Apr 18 '12 at 21:29

Ωmega

42,614
34
134
203

I'm not downvoter, but I believe he wants to find nested links, not links with a specific href. he was just using the href as an example to say which link should be kept. – Jonathan Kuhn Apr 18 '12 at 22:37
2

@JonathanKuhn - I should not be downvoted for unclear OP question. Besides of that nobody else posted alternative answers. – Ωmega Apr 18 '12 at 22:41
thats why I didn't downvote. the question needs some clarification. – Jonathan Kuhn Apr 18 '12 at 22:42
@Elias: Please see the updated code, run it and tell us which error message you get. – hakre May 13 '12 at 16:12

Remove links where href is disallowed

1 Answers1