I want to match all href values in my page content. I wrote regex for that and tested it on regex101
href[ ]*=[ ]*("|')(.+?)\1
This finds all my href values properly. If I use
href[ ]*=[ ]*(?:"|')(.+?)(?:"|')
its even better since I do not have to use certain group later.
With " and ' in regex string I cannot run the regex properly with
$matches = array();
$pattern = "/href[ ]*=[ ]*("|')(.+?)\1/"; // syntax error
$numOfMatches = preg_match_all($pattern, $pattern, $matches);
print_r($matches);
If I "escape" double quote and thus repair the syntax error I get no matches.
So - what is the correct way to apply the given regex in PHP?
Thanks for any help
Notes:
- addslashes or preg_quote won't help since I need to pass legit string first
- escaping all the special chars
\ + * ? [ ^ ] $ ( ) { } = ! < > | : -
didn't help either
EDIT: Ok, I see I really shouldn't be doing this with regex. Could you please provide some helpful DOM parsers or any other tool I 'should' use with PHP for instance ?