-1

I have the following regex:

$regex = '<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>';

how can I improve this to NOT match the word "files" or "resize" in the href tag:

<a href="./files/test.jpg">link</a> or
<a href="script.php?resize=xxxx"></a>
heppi75
  • 131
  • 10
  • 5
    Do not use regex to parse HTML. Use XML parser instead. – hsz Oct 16 '15 at 07:33
  • In addition to @hsz comment, this might be helpful for [more information on XML parsing with PHP](http://php.net/manual/en/book.xml.php). To not use regex should be the correct answer. – Jan Oct 16 '15 at 07:46

2 Answers2

0

yes parsing is the mutch better way to do this - maybe someone find this helpful:

$inhalt = new DOMDocument;
$inhalt->loadHTML($content->draw()[0][0]);
foreach ($inhalt->getElementsByTagName('a') as $node) {
 if ($node->hasAttribute('href')) {
  if (preg_match("/(files|resize)/", $node->getAttribute('href')) == 0) {
   $node->setAttribute('href', 'mobile.php?uri=http://www.example.com' . str_replace("..", "", $node->getAttribute('href')));
   $inhalt->saveHtml($node);
  }
 }
}

echo $inhalt->saveHtml();
heppi75
  • 131
  • 10
-1

You can use this regex to get all href string:

<a[^>]*href=[\"\'](.*?)[\"\'][^>]*>(.*?)</a>
Mayur Koshti
  • 1,794
  • 15
  • 20