0

I need to loop through a bunch of HTML code and remove the <a> </a> tags from all links which DONT include the data attribute data-link="keepLink"

Here is an example of body value I need to modify:

<p><a data-link=\"keepLink\" href=\"[1|9999|16|191967|256]\">Daily Racing Link</a></p>\r\n<br>\n <strong>OFFER – Get&nbsp;up to a £400 deposit bonus when you sign up with&nbsp;<a href="https://gateway.tracker.com/track-989">Fanduel</a>.</strong>

After the modification I need it to look like (so the offer link is removed):

<p><a data-link=\"keepLink\" href=\"[1|9999|16|191967|256]\">Daily Racing Link</a></p>\r\n<br>\n <strong>OFFER – Get&nbsp;up to a £400 deposit bonus when you sign up with&nbsp; Fanduel.</strong>

So far I have managed to get the first half of the link removing if it doesn't include a data-link="keepLink" attribute. But the closing </a> is still present.

Here is the regex I have used:

$result["body_value"] = preg_replace('/<a (?![^>]*data-link="keepLink").*?>/i', '', $result["body_value"]);

So the new body value looks like:

<p><a data-link=\"keepLink\" href=\"[1|9999|16|191967|256]\">Daily Racing Link</a></p>\r\n<br>\n <strong>OFFER – Get&nbsp;up to a £400 deposit bonus when you sign up with&nbsp; Fanduel</a>.</strong>
Manwal
  • 23,450
  • 12
  • 63
  • 93
smj2393
  • 1,929
  • 1
  • 23
  • 50

2 Answers2

4

The DOMDocument extension is available by default in PHP. It is presumably faster and is designed exactly for what you are trying to achieve. You can use it to load your document and search for any links without a data-link attribute like this:

$dom = new DOMDocument;
$dom->loadHTMLFile('http://www.example.com'); // load the file

$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a[not(@data-link=\'keepLink\')]'); // search for links that do not have the 'data-link' attribute set to 'keepLink'

foreach($nodes as $element){
    $textInside = $element->nodeValue; // get the text inside the link
    $parentNode = $element->parentNode; // save parent node
    $parentNode->replaceChild(new DOMText($textInside), $element); // remove the element
}

$myNewHTML = $dom->saveHTML(); // see http://php.net/manual/ro/domdocument.savehtml.php for limitations such as auto-adding of doc-type

echo $myNewHTML;

Proof of concept: https://3v4l.org/ejatQ.

Please bear in mind that this will take only the text values inside the elements without a data-link='keepLink' attribute value.

Octavian
  • 4,519
  • 8
  • 28
  • 39
0

If you are set on regex and don't want to use a parser.

Try this

<a (?!data-link=)[^>]*>((?!<\/a>).*?)<\/a>

And replace it by $1. To keep your link-text.

See https://regex101.com/r/wKQk4p/2

Please say if you need any further explaination.

Fallenhero
  • 1,563
  • 1
  • 8
  • 17