This is a great question for Google, you know... but just to wet your beak: Matthew O'Riordan has written such regular expression that mathces link with or without protocol.
Here's link to his blog post
But for future reference let me provide the regular expression from the post here as well:
/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[.\!\/\\w]*))?)/
And as nicely broken down by blog writer Matthew himself:
(
( # brackets covering match for protocol (optional) and domain
([A-Za-z]{3,9}:(?:\/\/)?) # match protocol, allow in format http:// or mailto:
(?:[\-;:&=\+\$,\w]+@)? # allow something@ for email addresses
[A-Za-z0-9\.\-]+ # anything looking at all like a domain, non-unicode domains
| # or instead of above
(?:www\.|[\-;:&=\+\$,\w]+@) # starting with something@ or www.
[A-Za-z0-9\.\-]+ # anything looking at all like a domain
)
( # brackets covering match for path, query string and anchor
(?:\/[\+~%\/\.\w\-]*) # allow optional /path
?\??(?:[\-\+=&;%@\.\w]*) # allow optional query string starting with ?
#?(?:[\.\!\/\\\w]*) # allow optional anchor #anchor
)? # make URL suffix optional
)
What about your particular example
But in your case of mathing URL domains the negative of [^\/:]
could simply be:
[-0-9a-zA-Z_.]
And that should match everything after // and before first /. But what happens when your URLs don't end with a slash? what will you do in that case?
Upper regular expression (simplification) only matches one character just like your negative character set does. So this just replaces your negative set in the complete reg ex you're using.