I have a string that contains html link tags and I need to user php preg_match_all to get the href value of the tags, but only if the tag does not have a rel='nofollow' attribute. I found the following expression that gets the href value of all the links.
$regex= "/<a\s[^>]*href=([\"\']??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU";
How can I modify it to only get the links I want? Here is what it should look like:
$string= "<a href='link1.php'>Link</a>";
$string.= "<a href='link2.php'>Link2</a>";
$string.= "<a href='link3.php' rel='nofollow'>Link3</a>";
$string.= "<a href='link4.php'>Link4</a>";
preg_match_all($regex, $string, $links);
so links should be:
$links[0] => 'link1.php';
$links[1] => 'link2.php';
$links[2] => 'link4.php';
I need the expression to pick up links that use both single and double quotes. Bonus would be to pick up ill formatted but still valid links. If it's not possible to get just the links I want then just a way to find the links I don't want and remove them from the array. Note string is generated dynamically and may not have the same attribute order and will contain other tags and characters besides just the links.