Regular Expression not match href strings

Question

I have the following regex:

$regex = '<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>';

how can I improve this to NOT match the word "files" or "resize" in the href tag:

<a href="./files/test.jpg">link</a> or
<a href="script.php?resize=xxxx"></a>

In addition to @hsz comment, this might be helpful for [more information on XML parsing with PHP](http://php.net/manual/en/book.xml.php). To not use regex should be the correct answer. — Jan, Oct 16 '15 at 07:46

score 0 · Answer 1 · answered Oct 16 '15 at 08:30

yes parsing is the mutch better way to do this - maybe someone find this helpful:

$inhalt = new DOMDocument;
$inhalt->loadHTML($content->draw()[0][0]);
foreach ($inhalt->getElementsByTagName('a') as $node) {
 if ($node->hasAttribute('href')) {
  if (preg_match("/(files|resize)/", $node->getAttribute('href')) == 0) {
   $node->setAttribute('href', 'mobile.php?uri=http://www.example.com' . str_replace("..", "", $node->getAttribute('href')));
   $inhalt->saveHtml($node);
  }
 }
}

echo $inhalt->saveHtml();

score -1 · Answer 2 · answered Oct 16 '15 at 08:22

-1

You can use this regex to get all href string:

<a[^>]*href=[\"\'](.*?)[\"\'][^>]*>(.*?)</a>

answered Oct 16 '15 at 08:22

Mayur Koshti

1,794
15
20

Regular Expression not match href strings

2 Answers2