33

Possible Duplicate:
PHP validation/regex for URL

Is there any easy, secure and fast way to check if a URL is valid in PHP?

Community
  • 1
  • 1
Oliver 'Oli' Jensen
  • 2,631
  • 9
  • 26
  • 40

3 Answers3

73

Yes, there is! Use filter_var:

if (filter_var($url, FILTER_VALIDATE_URL) !== false) ...

FILTER_VALIDATE_URL validates URLs according to RFC 2396.

trejder
  • 17,148
  • 27
  • 124
  • 216
Dan Grossman
  • 51,866
  • 10
  • 112
  • 101
  • 25
    FILTER_VALIDATE_URL is unreliable and it can't validate URLs based on IPv6 addresses. I stumbled on this whilst searching SO for any questions about PHP URL validation that don't use it, because I've found it to be pretty much useless. – GordonM Jan 10 '12 at 22:51
  • 5
    this is interpreted as a valid url, properly displaying the cookie: `echo filter_var('http://example.com/">', FILTER_VALIDATE_URL);` Please beware the `filter_var();` in 5.4 – Francisco Presencia Jan 09 '14 at 03:49
  • Regarding GordonM's comment, have a look at the following gist, for an example of how PHP's FILTER_VALIDATE_URL probably doesn't work how you might expect it to: https://gist.github.com/anonymous/10967187 – coatesap Apr 17 '14 at 09:06
  • 3
    Why did you use `(filter_var($url, FILTER_VALIDATE_URL) !== false)` and not `(filter_var($url, FILTER_VALIDATE_URL))`? – anna May 12 '14 at 12:43
  • 1
    @GordonM - there is "FILTER_FLAG_IPV6" to allow IPv6 address to be valid - http://www.php.net/manual/en/filter.filters.flags.php – Laurence Jun 06 '14 at 18:55
  • @coatesap I think you're mixing two things: whether an URL is really valid (follows given RFC or any other source of validation rules) and whether a browser can open it up. The second doesn't actually mean, that an URL is valid only, that browser will do as much as it can to convert "user stupidities" into "openable" URL. I stumbled upon this, when looking for URL validator for Javascript. There are many (dozens) of answers about it here on SO and there are hundreds of comments / examples of URLs, that can be opened in browser, but are claimed to be invalid. Just like in your gist. – trejder Jun 30 '15 at 06:41
  • @coatesap The general rule of thumb here is: If you want to check, if an URL is valid, then using method described in Dan Grossman's answer or any similar is the right choice. If you want to check if URL is "openable" then the only way you're left with, is to actually open it with any PHP url opener or wrapper method and check for returned result. There is absolutely no URL validator that will return `true` for any URL that any browser is able to open / parse / "understand". – trejder Jun 30 '15 at 06:43
  • @trejder My comment was really just a warning to any developers who might consider using this (without modification) to validate a user input for something like a 'website' field (where the protocol would probably not be entered), as I could see this being quite a common scenario. – coatesap Jul 01 '15 at 08:21
  • Here is an article explaining the problems with `filter_var($url, FILTER_VALIDATE_URL)`: https://d-mueller.de/blog/why-url-validation-with-filter_var-might-not-be-a-good-idea/ – thespacecamel Aug 31 '18 at 18:06
  • FILTER_VALIDATE_URL validates `ttps://www.youtube.com` as valid. It should not be used. – Jeffz May 17 '20 at 13:21
  • @Jeffz that is a valid url, structually – Shardj Aug 06 '20 at 11:53
  • This is a bit shit though, I mean really, 'http://http://.com' is an accepted url with this check – Shardj Aug 06 '20 at 15:50
  • It also can't handle utf8 characters – Shardj Aug 06 '20 at 16:00
17

Well if we look at RFC 3986 we can find the definition of a URL.

And if we take a look at Appendix B there is a guide to using regular expressions to parse a URL:

Appendix B. Parsing a URI Reference with a Regular Expression

As the "first-match-wins" algorithm is identical to the "greedy"
disambiguation method used by POSIX regular expressions, it is
natural and commonplace to use a regular expression for parsing the
potential five components of a URI reference.

The following line is the regular expression for breaking-down a
well-formed URI reference into its components.

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
   12            3  4          5       6  7        8 9

The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression as $. For example, matching the above expression to

  http://www.ics.uci.edu/pub/ietf/uri/#Related

results in the following subexpression matches:

  $1 = http:
  $2 = http
  $3 = //www.ics.uci.edu
  $4 = www.ics.uci.edu
  $5 = /pub/ietf/uri/
  $6 = <undefined>
  $7 = <undefined>
  $8 = #Related
  $9 = Related

where indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the five components as

  scheme    = $2
  authority = $4
  path      = $5
  query     = $7
  fragment  = $9

Going in the opposite direction, we can recreate a URI reference from its components by using the algorithm of Section 5.3.

You can ues this regular expression to parse the URL manually or use the built in parse_url function avalable in PHP 4 and 5

Devin M
  • 9,636
  • 2
  • 33
  • 46
0

It depends on your definition of valid. Semantically valid, domain name resolves, etc.

The quick approach would be to use preg_match to test the url against a good regular expression to validate it's of the correct format. There appear to be some good examples on this thread PHP validation/regex for URL

Community
  • 1
  • 1
Code Magician
  • 23,217
  • 7
  • 60
  • 77
  • With "valid" i mean, if it has http:// and ends with .EXT – Oliver 'Oli' Jensen Aug 09 '11 at 21:53
  • 5
    @Oliver: Notice the URL for this question. That's not a valid URL by your definition. – Michael Petrotta Aug 09 '11 at 21:55
  • 1
    Then a preg_match against a good regular expression or filter_var http://www.php.net/manual/en/filter.filters.validate.php is your best bet. If you go the regex route, make sure you get a good one that covers all the valid use cases (http:// https:// FQDN or not etc) – Code Magician Aug 09 '11 at 21:55