I think you need to way simplify this. There are plenty of URL validation RegExes out there, but as an exercise, I'll go through my thought process for constructing one.
- First, you need to match a protocol if there is one:
/((http|ftp)s?:\/\/)?
- Then match any series of non-whitespace characters:
\S+
- If you're trying to pick out URLs from text, you'll want to look for signs that it is a URL. Look for dots or slashes, then more non-whitespace:
[\.\/]\S*/
Now put it all together:
/(((http|ftp)s?:\/\/)|(\S+[\.\/]))\S*[^\s\.]*/
I'm guessing that your attempting to look for www.google
is because of the new TLDs... the fact is, such URLs might just look like google
, and so any word could be a URL. Trying to come up with a catch-all regex which matches valid URLs and nothing else isn't possible, so you're best just going with something simple like the above.
Edit: I've stuck a |
in there between the protocol part and the non-whitespace-then-dot-or-slash part to match http://google
if people choose to write new URLs like that
Edit 2: See comments for the next improvement. It makes sure google.com
matches, http://google
matches, and even google/
matches, but not a.
.