When you do [^mydomain.*\"\']
you are saying "match any character except a literal 'm', 'y', 'd', 'o', ..., '.', '*', etc.
Try something like:
#<a [^>]*\bhref=(['"])http.?://((?!mydomain)[^'"])+\1 *>.*?</a>#i
Notes:
- I turned your
a.*href
to a [^>]*\bhref
to make sure that the 'a' and 'href' are whole words and that the regex doesn't match over multiple tags.
- I changed the regex delimiter character to '#' instead of '/' so you don't have to escape the
/
any more
- Note the
((?!mydomain)[^'"])+
. This means "match [^'"]+ that isn't mydomain". The (?!
is called a negative look-ahead.
- Note the
\1
. This makes sure that the closing quote mark for the URL is the same as the opening quote mark (see hwo the first set of brackets captures the ['"]
?). You'd probably be fine without it if you prefered.
For PHP (updated because I always mix up when backslashes need to be escaped in PHP -- see @GlitchMr's comment below):
$pattern = '#<a [^>]*\bhref=([\'"])http.?://((?!mydomain)[^\'"])+\1 *>.*?</a>#i';
See it in action here, where you can tweak it to your purposes.