This is trickier than you think, and I urge you to avoid using regex for it.
Instead, you should use an HTML parser to find all <a>
tags in the document, then split their href
attributes on ;url=
and keep only the last part.
However, if you must use a regex, the following should work for most well-formed HTML:
preg_replace('/(<\s*a\s[^>]*href=)(["\'])(?:[^\1]*;url=)([^\1]*)(\1[^>]*>)/i', "$1$2$3$4", $url)
Explanation:
(<\s*a\s[^>]*\bhref=) # <a, optionally followed by other attributes, and then href. Whitespace is ignored. This will be captured in backreference $1.
(["\']) # Either " or ' to enclose the href value. This will be captured in $2 for matching later.
(?:[^\1]*;url=) # Any number of URLs followed by ";url=". This will be thrown out.
([^\1]*) # This is the URL you want to keep. It will keep matching until the end of the quotes. This will be captured into $3.
(\1[^>]*>) # The remainder of the <a> tag, including any other attributes. This is captured in $4.