You should consider using DOMDocument for manipulating HTML, because regular expressions will become incredibly complex if they have to deal with every possible situation.
Still, I will provide here a regular expression that improves on the following points:
- allows any white space to occur at several positions in the anchor element.
- allows other attributes to appear in the anchor, before and after
href
- allows upper/lower case (using the
i
modifier)
- allows the url to be wrapped in single quotes instead of double
- does not count it a match when "example" occurs after a
?
or #
in a URL
- allows line feeds in the anchor text (using the
s
modifier)
- requires "example" to be surrounded by dots.
Here it is:
$txt = preg_replace(
'/<a\s+(?:[^>]+\s+)*?href\s*=\s*["\'][^"\'#?]*?\.example\..*?[\"\']\s*>(.*?)<\/a\s*>/si',
"\\1", $txt);
But it has limits. For instance, if a URL for some reason would contain a quote, it would fail.
DOMDocument solution
Here is how to properly do such things. The code is longer, but will give more reliable results:
// function to remove links when URL address has given pattern
function DOMremoveLinks($dom, $url_pattern) {
foreach ($dom->getElementsByTagName('a') as $a) {
$href = $a->getAttribute('href');
if (preg_match($url_pattern, $href)) {
$span = $dom->createElement("span", $a->textContent);
$a->parentNode->replaceChild($span, $a);
}
}
}
// utility function to get innerHTML of an element
function DOMinnerHTML($element) {
$innerHTML = "";
foreach ($element->childNodes as $child) {
$innerHTML .= $element->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
// test data
$html = '<a name="this"
href = \'http://www.example.com/s/product/B00GJ4/ref=as_li_tl?ie=UTF8\'
target="_blank" >
Hello</a>
<a href="fdf.abc.com?example=fsdf">World</a>';
// create DOM for given HTML
$dom = new DOMDocument();
// ignore warnings
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors(false);
// call our function to make the replacement(s)
DOMremoveLinks($dom, "/^[^#?]*\.example\./");
// convert back to HTML
$html = DOMinnerHTML($dom->getElementsByTagName('body')->item(0));