This is the first one, broken up section by section. Even doing this was non-trivial...
(
^
|
[\n\t (>.]
)
OK, here we simply have "beginning of the line, or after a newline, tab, space, greater than, period. Just anchoring the regex.
(
[a-z]$scheme*:/{2}
This is pure insanity right here. $scheme
presumably holds http
, which means that this regex matches the http://
. Why someone would use /{2}
instead of //
, I cannot begin to guess.
(?:
(?:
[a-z0-9\-._~!$&'($inline*+,;=:@|]+
|
%[\dA-F]{2}
)+
|
This matches a series of characters, presumably those that are legal in a URL. Of note is the $inline
PHP variable – can't guess what that holds – and the second alternative, %[\dA-F]{2}
. That matches things like %20
for a space, etc. The %
sign is not otherwise legal in the match (or in a URL).
Also important here is that /
is not legal. This, therefore, cannot refer to directories, only to the domain. This is most likely the part you want to change, to simply match the appropriate domain of your website.
For completeness's sake, though, here's the rest.
[0-9.]+
|
Alternatively, we could have a series of digits and periods – an IP address. Considering how complicated this regex is, I'm surprised he didn't go for (?:\d{1,3}\.){3}\d{1,3}
...
\[
[a-z0-9.]+
:
[a-z0-9.]+
:
[a-z0-9.:]+
\]
)
Here's our last alternative; I think this is for IPv6. It's a series of hexadecimal numbers separated by colons, anyway. It requires that these be within square brackets, which I find odd, especially for a forum software that uses those so heavily for tags...
(?:
:
\d*
)?
Here, we get the option of some digits following a colon. That is, this is for URLs that have a port in them.
(?:
/
(?:
[a-z0-9\-._~!$&'($inline*+,;=:@|]+
|
%[\dA-F]{2}
)*
)*
OK, here we've gotten to the subdirectories, as shown by the /
at the beginning. Otherwise, this is the same "legal URL characters" match.
(?:
\?
(?:
[a-z0-9\-._~!$&'($inline*+,;=:@/?|]+
|
%[\dA-F]{2}
)*
)?
(?:
\#
(?:
[a-z0-9\-._~!$&'($inline*+,;=:@/?|]+
|
%[\dA-F]{2}
)*
)?
)
Finally, things that are being passed by GET
, indicated by the \?
, and URLs linking to a mid-page anchor, indicated by the \#
.
Bottom line:
This section:
[a-z]$scheme*:/{2}
(?:
(?:
[a-z0-9\-._~!$&'($inline*+,;=:@|]+
|
%[\dA-F]{2}
)+
|
[0-9.]+
|
\[
[a-z0-9.]+
:
[a-z0-9.]+
:
[a-z0-9.:]+
\]
)
Should be replaced with something like this:
[a-z]$scheme*://
www\.example\.com
Or maybe
[a-z]$scheme*://
(?:
www\.example\.com
|
192\.168\.0\.1
|
::ffff:192\.168\.0\.1
)
Where the domain and the IP addresses match your website. Obviously, you're going to have to remove the line breaks and indentation I did. I'd do it for you, but I think it's almost not worth it because you'll have a hard time finding the spot where you put your domain in the middle of all that.
You'll probably want to include some regex for subdomains or people leaving out the www.
or what have you.
You may also want to remove this:
(?:
:
\d*
)?
As you probably don't want people linking to other ports on your domain.
The second one looks to have roughly the same structure; as the comment says, it's just getting URLs that lack the protocol designator.