For what it's worth, this is the regex that you're looking for:
Raw Match Pattern:
<a ((?:(?!href).)*?)href=[\"\']https:\/\/((?:(?!my-domain.de).)*?)[\"\'](.*?)>(.*?)<\/a>
Raw Replace Pattern:
<a $1href="http://$2"$3>$4</a>
The PHP code is:
$content = preg_replace('/<a ((?:(?!href).)*?)href=[\"\']https:\/\/((?:(?!my-domain.de).)*?)[\"\'](.*?)>(.*?)<\/a>/i','<a $1href="http://$2"$3>$4</a>',$content);
That being said, be forewarned -- to Andy Lester's point, this regex is not reliable. Though in my opinion, the issue is not quite "the nature of HTML", or at least not simply that. The point being made in this admittedly-great resource -- http://htmlparsing.com/regexes -- is that you're attempting to re-invent the wheel on a very bumpy road. The broader concern is "not that regular expressions are evil, per se, but that overuse of regular expressions is evil." That quote is by Jeff Atwood, from an exceptional elaboration on the joy and terror of regular expressions here: Regular Expressions: Now You Have Two Problems (He also has an article specifically warning against using regular expressions to parse HTML -- Parsing Html The Cthulhu Way.)
Specifically in the case of my "solution" above, for instance -- the following input (with line returns) will not be matched, despite being valid HTML:
<a title="mytitle"
href="https://www.other-domain.de/path/index.html"
target="_blank">other domain</a>
The following inputs, however, are handled as desired:
<a href="https://my-domain.de">my domain</a>
<a href="https://other-domain.de">other domain</a>
<a href="https://www.my-domain.de/path/index.html">my domain</a>
<a href="https://www.other-domain.de/path/index.html">other domain</a>
<a title="other title" href="https://www.my-domain.de/path/index.html" target="_blank">other domain</a>
<a title="my title" href="https://www.other-domain.de/path/index.html" target="_blank">my domain</a>
becomes:
<a href="https://my-domain.de">my domain</a>
<a href="http://other-domain.de">other domain</a>
<a href="https://www.my-domain.de/path/index.html">my domain</a>
<a href="http://www.other-domain.de/path/index.html">other domain</a>
<a title="other title" href="https://www.my-domain.de/path/index.html" target="_blank">other domain</a>
<a title="my title" href="http://www.other-domain.de/path/index.html" target="_blank">my domain</a>
A great resource for explaining the full breakdown of the regex is here: http://www.myregextester.com/index.php
To replicate the test on that tool:
- select the "replace" operation
- put your regex into "match pattern"
- put the replacment into "replace pattern"
- select the "i" flag checkbox
- select the "explain" checkbox
- select the "PHP" checkbox
- put your target content into "source text"
- click "Submit"
For convenience and posterity, I've included the full explanation provided by that tool below, but two of the conceptual highlights are:
Match Pattern Explanation:
The regular expression:
`(?i-msx:<a ((?:(?!href).)*?)href=[\"\']https:\/\/((?:(?!my-domain.de).)*?)[\"\'](.*?)>(.*?)<\/a>)`
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?i-msx: group, but do not capture (case-insensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
<a '<a '
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
href 'href'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
href= 'href='
----------------------------------------------------------------------
[\"\'] any character of: '\"', '\''
----------------------------------------------------------------------
https: 'https:'
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
my-domain 'my-domain'
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
de 'de'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
[\"\'] any character of: '\"', '\''
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
> '>'
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
< '<'
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
a> 'a>'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------