0

I have this text input, and I need to check if the string is a valid web address, like http://www.example.com. How can be done with regular expressions in PHP?

Gumbo
  • 643,351
  • 109
  • 780
  • 844
Edgar
  • 1
  • 1
  • 1
    Syntactically valid and/or semantically valid? – Gumbo Aug 30 '10 at 15:31
  • The answer of nikic is perfect, and this: http://www.hashbangcode.com/blog/php-filter-filtervalidateurl-limitations-111.html. Thanks guys. – Edgar Aug 30 '10 at 15:50

6 Answers6

1

Use the filter extension:

filter_var($url, FILTER_VALIDATE_URL);

This will be far more robust than any regex you can write.

NikiC
  • 100,734
  • 37
  • 191
  • 225
0

Found this:

(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?

From Here:

A regex that validates a web address and matches an empty string?

Community
  • 1
  • 1
cbattlegear
  • 834
  • 4
  • 11
  • 1
    "www.mywebsite.com" won't be a valid website here – Colin Hebert Aug 30 '10 at 15:29
  • 1
    @Colin HEBERT: `www.mywebsite.com` is not a valid website anywhere except for when you type it in to your address bar (where `http://` is assumed). In most other instances, it's assumed to be a filename (and hence would be a relative path). So it depends on your exact use if you want it to validate or not (Personally, I would prepend `http://` if non-existant, and then run through a check such as this, or `filter_var`)... – ircmaxell Aug 30 '10 at 15:41
  • @ircmaxell see comments on @Gabriel's post – Colin Hebert Aug 30 '10 at 15:55
0

You need to first understand a web address before you can begin to parse it effectively. Yes, http://www.example.com is a valid address. So is www.example.com. Or example.com. Or http://example.com. Or prefix.example.com.

Have a look at the specifications for a URI, especially the Syntax components.

Stephen
  • 6,027
  • 4
  • 37
  • 55
0

I found the below from http://www.roscripts.com/PHP_regular_expressions_examples-136.html

//URL: Different URL parts
//Protocol, domain name, page and CGI parameters are captured into backreferenes 1 through 4
'\b((?#protocol)https?|ftp)://((?#domain)[-A-Z0-9.]+)((?#file)/[-A-Z0-9+&@#/%=~_|!:,.;]*)?((?#parameters)\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?'

//URL: Different URL parts
//Protocol, domain name, page and CGI parameters are captured into named capturing groups.
//Works as it is with .NET, and after conversion by RegexBuddy on the Use page with Python, PHP/preg and PCRE.
'\b(?<protocol>https?|ftp)://(?<domain>[-A-Z0-9.]+)(?<file>/[-A-Z0-9+&@#/%=~_|!:,.;]*)?(?<parameters>\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?'

//URL: Find in full text
//The final character class makes sure that if an URL is part of some text, punctuation such as a 
//comma or full stop after the URL is not interpreted as part of the URL.
'\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]'

//URL: Replace URLs with HTML links
preg_replace('\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]', '<a href="\0">\0</a>', $text);
Colin Hebert
  • 91,525
  • 15
  • 160
  • 151
Aaron Anodide
  • 16,906
  • 15
  • 62
  • 121
  • 1
    "www.mywebsite.com" won't be a valid website here – Colin Hebert Aug 30 '10 at 15:30
  • 1
    @Colin HEBERT: `www.mywebsite.com` is not an absolute URL; it will only be interpreted as URL path. – Gumbo Aug 30 '10 at 15:34
  • @Gumbo, still, it's a valid website, the "http://" part could/should be added at the output if needed. – Colin Hebert Aug 30 '10 at 15:38
  • @Colin HEBERT: It’s just the tolerance of the web browser to add that if entered into the location bar. But it’s still not a valid absolute URL. – Gumbo Aug 30 '10 at 15:42
  • @Gumbo, and a web site that require an URL should do this tolerance too. It's quite easy to check if the given url starts with [protocol]:// and add http:// if it doesn't. For instance in your profile on SO :) – Colin Hebert Aug 30 '10 at 15:54
  • @Colin HEBERT: That depends on how this regular expression is to be used. If it is to match a string it’s fine to use such a scheme. But if it is to search for a URL in a text, then this might not be a good solution. – Gumbo Aug 30 '10 at 16:01
  • @Gumbo, You're right, but I believe that @Edgar wants to check if the whole string is a valid website, so I suppose that's a simple field like the web-site of a user/member. – Colin Hebert Aug 30 '10 at 16:08
  • @Colin HEBERT: A web site is a collection of web pages and not a URL that describes only the location of it. So please use the proper terms. – Gumbo Aug 30 '10 at 16:17
  • But should `command.com` be a URL? No. It's an ambiguous string. Does it refer to a host? Possibly. Does it refer to a file? Possibly. It's context sensitive. And you need to be aware that without a context, it's impossible to do this correctly 100% of the time. You could say that anything with a leading `http://` OR a path afterwards (Such as `example.com/foo`) should be transformed into a URL, but note that only the first case is actually a URL... – ircmaxell Aug 30 '10 at 16:50
  • @Gumbo, you're right, my bad but I can edit my comments now. @ircmaxell Yes it's context sensitive, and in this context he needs to make sure that the string passed is a valid *URL*, so even if it's only a possible URL, you have to suppose that it's one (unless you're in a context that makes this data really sensitive). – Colin Hebert Aug 30 '10 at 17:10
0

In most cases you don't have to check if a string is a valid address.

Either it is, and a web site will be available or it won't be and the user will simply go back.

You should only escape illegals characters to avoid XSS, if your user doesn't want do give a valid website, it should be his problem.

(In most cases).

PS: If you still want to check URLs, look at nikic's answer.

Colin Hebert
  • 91,525
  • 15
  • 160
  • 151
0

To match more protocols, you can do:

((https?|s?ftp|gopher|telnet|file|notes|ms-help)://)?[\w:#@%/;$()~=\.&-]+
Toto
  • 89,455
  • 62
  • 89
  • 125