PHP validation/regex for URL

Question

I've been looking for a simple regex for URLs, does anybody have one handy that works well? I didn't find one with the zend framework validation classes and have seen several implementations.

This is a pretty good resource. Gives a list of lots of different patterns and tests: https://mathiasbynens.be/demo/url-regex — omar j, Nov 12 '14 at 16:25

Stanislav · Answer 1 · 2015-08-03T08:40:55.643

214

Use the filter_var() function to validate whether a string is URL or not:

var_dump(filter_var('example.com', FILTER_VALIDATE_URL));

It is bad practice to use regular expressions when not necessary.

EDIT: Be careful, this solution is not unicode-safe and not XSS-safe. If you need a complex validation, maybe it's better to look somewhere else.

edited Aug 03 '15 at 08:40

answered Oct 16 '08 at 06:55

Stanislav

2,683
2
18
14

1

this is definitely a great alternative, unfortunately it's php 5.2+ (unless you install the PECL version) – Owen Oct 19 '08 at 08:07
29

There's a bug in 5.2.13 (and I think 5.3.2) that prevents urls with dashes in them from validating using this method. – vamin Jun 01 '10 at 23:27
15

filter_var will reject http://test-site.com, I have domain names with dashes, wheter they are valid or not. I don't think filter_var is the best way to validate a url. It will allow a url like `http://www` – Cesar Sep 06 '10 at 19:30
4

> It will allow a url like 'http://www' It is OK when URL like 'http://localhost' – Stanislav Sep 07 '10 at 10:34
One particular problem: This validates URLs according to RFC 2396 which does not allow underscores in subdomains, but some websites do have underscores in subdomains. – liviucmg Mar 29 '11 at 17:43
12

The other problem with this method is it is not unicode-safe. – Benji XVI May 10 '11 at 13:24
1

The `filter_var` function has since been updated and now it's possible to validate URLs effectively with dashes included, rendering the your comment incorrect, @vamin ([see bug report here](https://bugs.php.net/bug.php?id=51192)). – Zack Zatkin-Gold Jan 12 '12 at 02:04
@zzatkin, the bug report states that the fix is incorporated into the later 5.2.14 and 5.3.3 versions (it came too late for 5.2.13 and 5.3.2), though I agree it's not really an issue anymore so long as you keep PHP up to date. – vamin Jan 23 '12 at 18:12
It also will validate http://www.onedomain.com
http://www.anotherone.com
http://www.yetanother.com I'm finding out today. Not what I had in mind! Going back to a regular expression alternative (PHP Version => 5.4.4) – Bretticus Nov 19 '12 at 19:31
Dosen't accept UTF-8 characters. Will return false for `http://wiki.com/öva/mä/åäö`. – Sawny Dec 16 '12 at 19:06
The filter_var appears to validate all different kinds of URL formats whether they are valid or not, it seems that the regex is the way to correctly validate URL's – mic Sep 30 '13 at 09:35
yet another issue is that it does not validate against newer tlds like .me, .cm .guru etc – bhaskarc Mar 15 '15 at 17:59
This is a bad solution which should not have so many up votes. Highly XSS vulnerable. – RisingSun May 04 '15 at 18:46
1

Downvoted as dangerous. Read the comments about it the online PHP manual! – Nick Rice Sep 12 '16 at 11:09
3

FILTER_VALIDATE_URL has [a lot of problems](https://bugs.php.net/search.php?cmd=display&search_for=FILTER_VALIDATE_URL) that need fixing. Also, the [docs describing the flags](http://php.net/manual/en/filter.filters.validate.php) do not reflect the [actual source code](https://github.com/php/php-src/blob/master/ext/filter/logical_filters.c#L517) where references to some flags have been removed entirely. More info here: http://news.php.net/php.internals/99018 – S. Imp May 12 '17 at 21:53
Hree's another article explaining the problems with this: https://d-mueller.de/blog/why-url-validation-with-filter_var-might-not-be-a-good-idea/ – thespacecamel Aug 31 '18 at 18:31
it is a bad solution, 'cause `a://site.com` is valid for FILTER_VALIDATE_URL (PHP 7.2 and older versions) – Karel Wintersky Jul 21 '20 at 11:29

score 86 · Accepted Answer · edited Jun 22 '22 at 10:46

86

I used this on a few projects, I don't believe I've run into issues, but I'm sure it's not exhaustive:

$text = preg_replace(
  '#((https?|ftp)://(\S*?\.\S*?))([\s)\[\]{},;"\':<]|\.\s|$)#i',
  "'<a href=\"$1\" target=\"_blank\">$3</a>$4'",
  $text
);

Most of the random junk at the end is to deal with situations like http://domain.example. in a sentence (to avoid matching the trailing period). I'm sure it could be cleaned up but since it worked. I've more or less just copied it over from project to project.

edited Jun 22 '22 at 10:46

Stephen Ostermiller

23,933
14
88
109

answered Oct 15 '08 at 19:30

Owen

82,995
21
120
115

7

Some things that jump out at me: use of alternation where character classes are called for (every alternative matches exactly one character); and the replacement shouldn't have needed the outer double-quotes (they were only needed because of the pointless /e modifier on the regex). – Alan Moore May 30 '09 at 05:53
1

@John Scipione: `google.com` is only a valid relative URL path but not a valid absolute URL. And I think that’s what he’s looking for. – Gumbo Jan 04 '10 at 08:30
This doesn't work in this case - it includes the trailing ": 3 cantari noi in albumul Diverse – Softy Feb 02 '11 at 09:06
1

@Softy something like `http://example.com/somedir/...` is a perfectly legitimate URL, asking for the file named `...` - which is a legitimate file name. – Stephen P Jul 27 '11 at 23:55
I'm using Zend\Validator\Regex to validate url using your pattern, but it still detect `http://www.example` to be valid – Joko Wandiro Nov 26 '13 at 08:03

score 31 · Answer 3 · answered Dec 27 '08 at 14:12

31

As per the PHP manual - parse_url should not be used to validate a URL.

Unfortunately, it seems that filter_var('example.com', FILTER_VALIDATE_URL) does not perform any better.

Both parse_url() and filter_var() will pass malformed URLs such as http://...

Therefore in this case - regex is the better method.

answered Dec 27 '08 at 14:12

catchdave

9,053
2
27
16

11

This argument doesn't follow. If FILTER_VALIDATE_URL is a little more permissive than you want, tack on some additional checks to deal with those edge cases. Reinventing the wheel with your own attempt at a regex against urls is only going to get you further from a complete check. – Kzqai Jul 19 '10 at 00:50
2

See all the shot-down regexes on this page for examples of why -not- to write your own. – Kzqai Jul 19 '10 at 02:54
3

You make a fair point Tchalvak. Regexes for something like URLs can (as per other responses) be very hard to get right. Regex is not always the answer. Conversely regex is also not always the wrong answer either. The important point is to pick the right tool (regex or otherwise) for the job and not be specifically "anti" or "pro" regex. In hindsight, your answer of using filter_var in combination with constraints on its edge-cases, looks like the better answer (particularly when regex answers start to get to greater than 100 chars or so - making maintenance of said regex a nightmare) – catchdave Jul 20 '10 at 04:54
1. `filter_var()` seems to not allow “malformed URLs such as `http://...`“. (Well, it might allow it in 2008…) In my current tests, it behaves better than suggested regexes. 2. As this answer hasn’t included an actual regex, it is not useful. – Melebius Feb 17 '23 at 08:37

score 15 · Answer 4 · answered Mar 13 '11 at 11:46

As per John Gruber (Daring Fireball):

Regex:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))

using in preg_match():

preg_match("/(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $url)

Here is the extended regex pattern (with comments):

(?xi)
\b
(                       # Capture 1: entire matched URL
  (?:
    https?://               # http or https protocol
    |                       #   or
    www\d{0,3}[.]           # "www.", "www1.", "www2." … "www999."
    |                           #   or
    [a-z0-9.\-]+[.][a-z]{2,4}/  # looks like domain name followed by a slash
  )
  (?:                       # One or more:
    [^\s()<>]+                  # Run of non-space, non-()<>
    |                           #   or
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
  )+
  (?:                       # End with:
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
    |                               #   or
    [^\s`!()\[\]{};:'".,<>?«»“”‘’]        # not a space or one of these punct chars
  )
)

For more details please look at: http://daringfireball.net/2010/07/improved_regex_for_matching_urls

To work, the pattern needs to escape the forward slashes with backslashes in three points: preg_match("/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|$([^\s()<>]+|(\([^\s()<>]+$))*\))+(?:$([^\s()<>]+|(\([^\s()<>]+$))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $url) — Ben Birney, Oct 03 '20 at 09:45

Roger · Answer 5 · 2011-02-11T14:46:50.737

13

Just in case you want to know if the url really exists:

function url_exist($url){//se passar a URL existe
    $c=curl_init();
    curl_setopt($c,CURLOPT_URL,$url);
    curl_setopt($c,CURLOPT_HEADER,1);//get the header
    curl_setopt($c,CURLOPT_NOBODY,1);//and *only* get the header
    curl_setopt($c,CURLOPT_RETURNTRANSFER,1);//get the response as a string from curl_exec(), rather than echoing it
    curl_setopt($c,CURLOPT_FRESH_CONNECT,1);//don't use a cached version of the url
    if(!curl_exec($c)){
        //echo $url.' inexists';
        return false;
    }else{
        //echo $url.' exists';
        return true;
    }
    //$httpcode=curl_getinfo($c,CURLINFO_HTTP_CODE);
    //return ($httpcode<400);
}

edited Feb 11 '11 at 14:46

answered Feb 11 '11 at 13:04

Roger

8,286
17
59
77

1

I would still do some kind of validation on `$url` before actually verifying the url is real because the above operation is expensive - perhaps as much as 200 milliseconds depending on file size. In some cases the url may not actually have a resource at its location available yet (e.g. creating a url to an image that has yet to be uploaded). Additionally you're not using a cached version so its not like `file_exists()` that will cache a stat on a file and return nearly instantly. The solution you provided is still useful though. Why not just use `fopen($url, 'r')`? – Yzmir Ramirez Aug 06 '11 at 18:14
Thanks, just what I was looking for. However, I made a mistake trying to use it. The function is "url_exist" not "url_exists" oops ;-) – PJ Brunet Mar 20 '12 at 20:24
9

Is there any security risk in directly accessing the user entered URL? – siliconpi May 10 '12 at 07:14
you would like to add a check if a 404 was found: $httpCode = curl_getinfo( $c, CURLINFO_HTTP_CODE ); //echo $url . ' ' . $httpCode . ' '; if( $httpCode == 404 ) { echo $url.' 404'; } – Camaleo Mar 12 '18 at 13:28
Isn't safe at all.. any input URL would be actively accessed. – dmmd Oct 28 '19 at 17:36

score 10 · Answer 6 · edited May 12 '11 at 09:14

I don't think that using regular expressions is a smart thing to do in this case. It is impossible to match all of the possibilities and even if you did, there is still a chance that url simply doesn't exist.

Here is a very simple way to test if url actually exists and is readable :

if (preg_match("#^https?://.+#", $link) and @fopen($link,"r")) echo "OK";

(if there is no preg_match then this would also validate all filenames on your server)

score 8 · Answer 7 · answered Oct 15 '08 at 19:36

8

I've used this one with good success - I don't remember where I got it from

$pattern = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";

answered Oct 15 '08 at 19:36

Peter Bailey

105,256
31
182
206

^(http://|https://)?(([a-z0-9]?([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z]{2,6} (may be too greedy, not sure yet, but it's more flexible on protocol and leading www) – andrewbadera Aug 26 '09 at 15:54

score 8 · Answer 8 · answered Jun 27 '19 at 12:55

The best URL Regex that worked for me:

function valid_URL($url){
    return preg_match('%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu', $url);
}

Examples:

valid_URL('https://twitter.com'); // true
valid_URL('http://twitter.com');  // true
valid_URL('http://twitter.co');   // true
valid_URL('http://t.co');         // true
valid_URL('http://twitter.c');    // false
valid_URL('htt://twitter.com');   // false

valid_URL('http://example.com/?a=1&b=2&c=3'); // true
valid_URL('http://127.0.0.1');    // true
valid_URL('');                    // false
valid_URL(1);                     // false

Source: http://urlregex.com/

score 7 · Answer 9 · answered Oct 16 '12 at 08:57

    function validateURL($URL) {
      $pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
      $pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";       
      if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){
        return true;
      } else{
        return false;
      }
    }

Doesn't works with link like: 'www.w3schools.com/home/3/?a=l' — user3396065, Nov 20 '16 at 15:20

score 5 · Answer 10 · edited Jul 15 '13 at 13:20

And there is your answer =) Try to break it, you can't!!!

function link_validate_url($text) {
$LINK_DOMAINS = 'aero|arpa|asia|biz|com|cat|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|mobi|local';
  $LINK_ICHARS_DOMAIN = (string) html_entity_decode(implode("", array( // @TODO completing letters ...
    "&#x00E6;", // æ
    "&#x00C6;", // Æ
    "&#x00C0;", // À
    "&#x00E0;", // à
    "&#x00C1;", // Á
    "&#x00E1;", // á
    "&#x00C2;", // Â
    "&#x00E2;", // â
    "&#x00E5;", // å
    "&#x00C5;", // Å
    "&#x00E4;", // ä
    "&#x00C4;", // Ä
    "&#x00C7;", // Ç
    "&#x00E7;", // ç
    "&#x00D0;", // Ð
    "&#x00F0;", // ð
    "&#x00C8;", // È
    "&#x00E8;", // è
    "&#x00C9;", // É
    "&#x00E9;", // é
    "&#x00CA;", // Ê
    "&#x00EA;", // ê
    "&#x00CB;", // Ë
    "&#x00EB;", // ë
    "&#x00CE;", // Î
    "&#x00EE;", // î
    "&#x00CF;", // Ï
    "&#x00EF;", // ï
    "&#x00F8;", // ø
    "&#x00D8;", // Ø
    "&#x00F6;", // ö
    "&#x00D6;", // Ö
    "&#x00D4;", // Ô
    "&#x00F4;", // ô
    "&#x00D5;", // Õ
    "&#x00F5;", // õ
    "&#x0152;", // Œ
    "&#x0153;", // œ
    "&#x00FC;", // ü
    "&#x00DC;", // Ü
    "&#x00D9;", // Ù
    "&#x00F9;", // ù
    "&#x00DB;", // Û
    "&#x00FB;", // û
    "&#x0178;", // Ÿ
    "&#x00FF;", // ÿ 
    "&#x00D1;", // Ñ
    "&#x00F1;", // ñ
    "&#x00FE;", // þ
    "&#x00DE;", // Þ
    "&#x00FD;", // ý
    "&#x00DD;", // Ý
    "&#x00BF;", // ¿
  )), ENT_QUOTES, 'UTF-8');

  $LINK_ICHARS = $LINK_ICHARS_DOMAIN . (string) html_entity_decode(implode("", array(
    "&#x00DF;", // ß
  )), ENT_QUOTES, 'UTF-8');
  $allowed_protocols = array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal');

  // Starting a parenthesis group with (?: means that it is grouped, but is not captured
  $protocol = '((?:'. implode("|", $allowed_protocols) .'):\/\/)';
  $authentication = "(?:(?:(?:[\w\.\-\+!$&'\(\)*\+,;=" . $LINK_ICHARS . "]|%[0-9a-f]{2})+(?::(?:[\w". $LINK_ICHARS ."\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})*)?)?@)";
  $domain = '(?:(?:[a-z0-9' . $LINK_ICHARS_DOMAIN . ']([a-z0-9'. $LINK_ICHARS_DOMAIN . '\-_\[\]])*)(\.(([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_\[\]])+\.)*('. $LINK_DOMAINS .'|[a-z]{2}))?)';
  $ipv4 = '(?:[0-9]{1,3}(\.[0-9]{1,3}){3})';
  $ipv6 = '(?:[0-9a-fA-F]{1,4}(\:[0-9a-fA-F]{1,4}){7})';
  $port = '(?::([0-9]{1,5}))';

  // Pattern specific to external links.
  $external_pattern = '/^'. $protocol .'?'. $authentication .'?('. $domain .'|'. $ipv4 .'|'. $ipv6 .' |localhost)'. $port .'?';

  // Pattern specific to internal links.
  $internal_pattern = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]]+)";
  $internal_pattern_file = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]\.]+)$/i";

  $directories = "(?:\/[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'#!():;*@\[\]]*)*";
  // Yes, four backslashes == a single backslash.
  $query = "(?:\/?\?([?a-z0-9". $LINK_ICHARS ."+_|\-\.~\/\\\\%=&,$'():;*@\[\]{} ]*))";
  $anchor = "(?:#[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'():;*@\[\]\/\?]*)";

  // The rest of the path for a standard URL.
  $end = $directories .'?'. $query .'?'. $anchor .'?'.'$/i';

  $message_id = '[^@].*@'. $domain;
  $newsgroup_name = '(?:[0-9a-z+-]*\.)*[0-9a-z+-]*';
  $news_pattern = '/^news:('. $newsgroup_name .'|'. $message_id .')$/i';

  $user = '[a-zA-Z0-9'. $LINK_ICHARS .'_\-\.\+\^!#\$%&*+\/\=\?\`\|\{\}~\'\[\]]+';
  $email_pattern = '/^mailto:'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/';

  if (strpos($text, '<front>') === 0) {
    return false;
  }
  if (in_array('mailto', $allowed_protocols) && preg_match($email_pattern, $text)) {
    return false;
  }
  if (in_array('news', $allowed_protocols) && preg_match($news_pattern, $text)) {
    return false;
  }
  if (preg_match($internal_pattern . $end, $text)) {
    return false;
  }
  if (preg_match($external_pattern . $end, $text)) {
    return false;
  }
  if (preg_match($internal_pattern_file, $text)) {
    return false;
  }

  return true;
}

There are a lot more [top level domains](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains). — Jeff Puckett, Sep 26 '16 at 20:17
Your `.`, `?`, `+`, `^`, `{`, `}`, `=`, `|`, `$`, backtick, and `[` do not need escaping in your character classes. `+` is even repeated in one of your character classes. `:` does not need to be escaped. — mickmackusa, Sep 27 '21 at 10:26

score 5 · Answer 11 · edited May 23 '17 at 12:02

5

Edit:
As incidence pointed out this code has been DEPRECATED with the release of PHP 5.3.0 (2009-06-30) and should be used accordingly.

Just my two cents but I've developed this function and have been using it for a while with success. It's well documented and separated so you can easily change it.

// Checks if string is a URL
// @param string $url
// @return bool
function isURL($url = NULL) {
    if($url==NULL) return false;

    $protocol = '(http://|https://)';
    $allowed = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)';

    $regex = "^". $protocol . // must include the protocol
             '(' . $allowed . '{1,63}\.)+'. // 1 or several sub domains with a max of 63 chars
             '[a-z]' . '{2,6}'; // followed by a TLD
    if(eregi($regex, $url)==true) return true;
    else return false;
}

edited May 23 '17 at 12:02

Community

1
1

answered Mar 12 '09 at 17:17

Frankie

24,627
10
79
121

1

Eregi will be removed in PHP 6.0.0. And domains with "öäåø" will not validate with your function. You probably should convert the URL to punycode first? – Dec 10 '09 at 15:48
@incidence absolutely agree. I wrote this in March and PHP 5.3 only came out late June setting eregi as DEPRECATED. Thank you. Gonna edit and update. – Frankie Dec 10 '09 at 18:05
Correct me if I'm wrong, but can we still assume TLDs will have a minimum of 2 characters and maximum of 6 characters? – Yzmir Ramirez Aug 06 '11 at 18:15
2

@YzmirRamirez (All these years later...) If there was any doubt when you wrote your comment there certainly isn't now, with TLDs these days such as .photography – Nick Rice Sep 12 '16 at 11:02
@NickRice you are correct...how much the web changes in 5 years. Now I can't wait until someone makes the TLD .supercalifragilisticexpialidocious – Yzmir Ramirez Sep 13 '16 at 17:03

score 4 · Answer 12 · answered Mar 30 '11 at 20:45

function is_valid_url ($url="") {

        if ($url=="") {
            $url=$this->url;
        }

        $url = @parse_url($url);

        if ( ! $url) {


            return false;
        }

        $url = array_map('trim', $url);
        $url['port'] = (!isset($url['port'])) ? 80 : (int)$url['port'];
        $path = (isset($url['path'])) ? $url['path'] : '';

        if ($path == '') {
            $path = '/';
        }

        $path .= ( isset ( $url['query'] ) ) ? "?$url[query]" : '';



        if ( isset ( $url['host'] ) AND $url['host'] != gethostbyname ( $url['host'] ) ) {
            if ( PHP_VERSION >= 5 ) {
                $headers = get_headers("$url[scheme]://$url[host]:$url[port]$path");
            }
            else {
                $fp = fsockopen($url['host'], $url['port'], $errno, $errstr, 30);

                if ( ! $fp ) {
                    return false;
                }
                fputs($fp, "HEAD $path HTTP/1.1\r\nHost: $url[host]\r\n\r\n");
                $headers = fread ( $fp, 128 );
                fclose ( $fp );
            }
            $headers = ( is_array ( $headers ) ) ? implode ( "\n", $headers ) : $headers;
            return ( bool ) preg_match ( '#^HTTP/.*\s+[(200|301|302)]+\s#i', $headers );
        }

        return false;
    }

Hi this solution is good, and i upvoted it, but it doesn't take into account the standard port for https: -- suggest you just replace 80 with '' where it works out the port — pgee70, Sep 28 '14 at 21:41
I ended up implementing a variation on this, because my domain cares whether an URL actually exists or not :) — Raz0rwire, Jul 18 '16 at 13:34

score 3 · Answer 13 · edited May 23 '17 at 12:10

Inspired in this .NET StackOverflow question and in this referenced article from that question there is this URI validator (URI means it validates both URL and URN).

if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
{
    throw new \RuntimeException( "URI has not a valid format." );
}

I have successfully unit-tested this function inside a ValueObject I made named Uri and tested by UriTest.

UriTest.php (Contains valid and invalid cases for both URLs and URNs)

<?php

declare( strict_types = 1 );

namespace XaviMontero\ThrasherPortage\Tests\Tour;

use XaviMontero\ThrasherPortage\Tour\Uri;

class UriTest extends \PHPUnit_Framework_TestCase
{
    private $sut;

    public function testCreationIsOfProperClassWhenUriIsValid()
    {
        $sut = new Uri( 'http://example.com' );
        $this->assertInstanceOf( 'XaviMontero\\ThrasherPortage\\Tour\\Uri', $sut );
    }

    /**
     * @dataProvider urlIsValidProvider
     * @dataProvider urnIsValidProvider
     */
    public function testGetUriAsStringWhenUriIsValid( string $uri )
    {
        $sut = new Uri( $uri );
        $actual = $sut->getUriAsString();

        $this->assertInternalType( 'string', $actual );
        $this->assertEquals( $uri, $actual );
    }

    public function urlIsValidProvider()
    {
        return
            [
                [ 'http://example-server' ],
                [ 'http://example.com' ],
                [ 'http://example.com/' ],
                [ 'http://subdomain.example.com/path/?parameter1=value1&parameter2=value2' ],
                [ 'random-protocol://example.com' ],
                [ 'http://example.com:80' ],
                [ 'http://example.com?no-path-separator' ],
                [ 'http://example.com/pa%20th/' ],
                [ 'ftp://example.org/resource.txt' ],
                [ 'file://../../../relative/path/needs/protocol/resource.txt' ],
                [ 'http://example.com/#one-fragment' ],
                [ 'http://example.edu:8080#one-fragment' ],
            ];
    }

    public function urnIsValidProvider()
    {
        return
            [
                [ 'urn:isbn:0-486-27557-4' ],
                [ 'urn:example:mammal:monotreme:echidna' ],
                [ 'urn:mpeg:mpeg7:schema:2001' ],
                [ 'urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                [ 'rare-urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                [ 'urn:FOO:a123,456' ]
            ];
    }

    /**
     * @dataProvider urlIsNotValidProvider
     * @dataProvider urnIsNotValidProvider
     */
    public function testCreationThrowsExceptionWhenUriIsNotValid( string $uri )
    {
        $this->expectException( 'RuntimeException' );
        $this->sut = new Uri( $uri );
    }

    public function urlIsNotValidProvider()
    {
        return
            [
                [ 'only-text' ],
                [ 'http//missing.colon.example.com/path/?parameter1=value1&parameter2=value2' ],
                [ 'missing.protocol.example.com/path/' ],
                [ 'http://example.com\\bad-separator' ],
                [ 'http://example.com|bad-separator' ],
                [ 'ht tp://example.com' ],
                [ 'http://exampl e.com' ],
                [ 'http://example.com/pa th/' ],
                [ '../../../relative/path/needs/protocol/resource.txt' ],
                [ 'http://example.com/#two-fragments#not-allowed' ],
                [ 'http://example.edu:portMustBeANumber#one-fragment' ],
            ];
    }

    public function urnIsNotValidProvider()
    {
        return
            [
                [ 'urn:mpeg:mpeg7:sch ema:2001' ],
                [ 'urn|mpeg:mpeg7:schema:2001' ],
                [ 'urn?mpeg:mpeg7:schema:2001' ],
                [ 'urn%mpeg:mpeg7:schema:2001' ],
                [ 'urn#mpeg:mpeg7:schema:2001' ],
            ];
    }
}

Uri.php (Value Object)

<?php

declare( strict_types = 1 );

namespace XaviMontero\ThrasherPortage\Tour;

class Uri
{
    /** @var string */
    private $uri;

    public function __construct( string $uri )
    {
        $this->assertUriIsCorrect( $uri );
        $this->uri = $uri;
    }

    public function getUriAsString()
    {
        return $this->uri;
    }

    private function assertUriIsCorrect( string $uri )
    {
        // https://stackoverflow.com/questions/30847/regex-to-validate-uris
        // http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/

        if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
        {
            throw new \RuntimeException( "URI has not a valid format." );
        }
    }
}

Running UnitTests

There are 65 assertions in 46 tests. Caution: there are 2 data-providers for valid and 2 more for invalid expressions. One is for URLs and the other for URNs. If you are using a version of PhpUnit of v5.6* or earlier then you need to join the two data providers into a single one.

xavi@bromo:~/custom_www/hello-trip/mutant-migrant$ vendor/bin/phpunit
PHPUnit 5.7.3 by Sebastian Bergmann and contributors.

..............................................                    46 / 46 (100%)

Time: 82 ms, Memory: 4.00MB

OK (46 tests, 65 assertions)

Code coverage

There's is 100% of code-coverage in this sample URI checker.

Some_North_korea_kid · Answer 14 · 2018-07-20T15:30:18.243

"/(http(s?):\/\/)([a-z0-9\-]+\.)+[a-z]{2,4}(\.[a-z]{2,4})*(\/[^ ]+)*/i"

(http(s?)://) means http:// or https://

([a-z0-9-]+.)+ => 2.0[a-z0-9-] means any a-z character or any 0-9 or (-)sign)

             2.1 (+) means the character can be one or more ex: a1w, 
                 a9-,c559s, f)

             2.2 \. is (.)sign

             2.3. the (+) sign after ([a-z0-9\-]+\.) mean do 2.1,2.2,2.3 
                at least 1 time 
              ex: abc.defgh0.ig, aa.b.ced.f.gh. also in case www.yyy.com

             3.[a-z]{2,4} mean a-z at least 2 character but not more than 
                          4 characters for check that there will not be 
                          the case 
                          ex: https://www.google.co.kr.asdsdagfsdfsf

             4.(\.[a-z]{2,4})*(\/[^ ]+)* mean 

               4.1 \.[a-z]{2,4} means like number 3 but start with 
                   (.)sign 

               4.2 * means (\.[a-z]{2,4})can be use or not use never mind

               4.3 \/ means \
               4.4 [^ ] means any character except blank
               4.5 (+) means do 4.3,4.4,4.5 at least 1 times
               4.6 (*) after (\/[^ ]+) mean use 4.3 - 4.5 or not use 
                   no problem

               use for case https://stackoverflow.com/posts/51441301/edit

               5. when you use regex write in "/ /" so it come

"/(http(s?)://)([a-z0-9-]+.)+[a-z]{2,4}(.[a-z]{2,4})(/[^ ]+)/i"

               6. almost forgot: letter i on the back mean ignore case of 
                  Big letter or small letter ex: A same as a, SoRRy same 
                  as sorry.

Note : Sorry for bad English. My country not use it well.

Did you notice how old this question is? Please explain your regex, users who do not know already will have a hard time understanding it without details. — Nic3500, Jul 20 '18 at 11:41

Tim Groeneveld · Answer 15 · 2016-11-28T01:32:45.803

OK, so this is a little bit more complex then a simple regex, but it allows for different types of urls.

Examples:

google.com
www.microsoft.com/
http://www.yahoo.com/
https://www.bandcamp.com/artist/#!someone-special!

All which should be marked as valid.

function is_valid_url($url) {
    // First check: is the url just a domain name? (allow a slash at the end)
    $_domain_regex = "|^[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})/?$|";
    if (preg_match($_domain_regex, $url)) {
        return true;
    }

    // Second: Check if it's a url with a scheme and all
    $_regex = '#^([a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))$#';
    if (preg_match($_regex, $url, $matches)) {
        // pull out the domain name, and make sure that the domain is valid.
        $_parts = parse_url($url);
        if (!in_array($_parts['scheme'], array( 'http', 'https' )))
            return false;

        // Check the domain using the regex, stops domains like "-example.com" passing through
        if (!preg_match($_domain_regex, $_parts['host']))
            return false;

        // This domain looks pretty valid. Only way to check it now is to download it!
        return true;
    }

    return false;
}

Note that there is a in_array check for the protocols that you want to allow (currently only http and https are in that list).

var_dump(is_valid_url('google.com'));         // true
var_dump(is_valid_url('google.com/'));        // true
var_dump(is_valid_url('http://google.com'));  // true
var_dump(is_valid_url('http://google.com/')); // true
var_dump(is_valid_url('https://google.com')); // true

Throws: ErrorException: Undefined index: scheme if the protocol is not specified i suggest to check if is set before. — user3396065, Nov 20 '16 at 15:34
@user3396065, can you please provide an example input that throws this? — Tim Groeneveld, Nov 28 '16 at 01:31

score 1 · Answer 16 · answered Aug 31 '18 at 18:01

For anyone developing with WordPress, just use

esc_url_raw($url) === $url

to validate a URL (here's WordPress' documentation on esc_url_raw). It handles URLs much better than filter_var($url, FILTER_VALIDATE_URL) because it is unicode and XSS-safe. (Here is a good article mentioning all the problems with filter_var).

score 0 · Answer 17 · answered Aug 21 '14 at 09:17

0

Here is the way I did it. But I want to mentoin that I am not so shure about the regex. But It should work thou :)

$pattern = "#((http|https)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|”|\"|'|:|\<|$|\.\s)#i";
        $text = preg_replace_callback($pattern,function($m){
                return "<a href=\"$m[1]\" target=\"_blank\">$m[1]</a>$m[4]";
            },
            $text);

This way you won't need the eval marker on your pattern.

Hope it helps :)

answered Aug 21 '14 at 09:17

Thomas Venturini

3,500
4
34
43

`(http|https)` is more simply `https?`. The excessive use of pipes in this pattern negative impacts readability and brevity. Many of the escaped characters in your pattern do not need escaping. – mickmackusa Sep 27 '21 at 10:30

score 0 · Answer 18 · answered Feb 08 '17 at 16:01

Here's a simple class for URL Validation using RegEx and then cross-references the domain against popular RBL (Realtime Blackhole Lists) servers:

Install:

require 'URLValidation.php';

Usage:

require 'URLValidation.php';
$urlVal = new UrlValidation(); //Create Object Instance

Add a URL as the parameter of the domain() method and check the the return.

$urlArray = ['http://www.bokranzr.com/test.php?test=foo&test=dfdf', 'https://en-gb.facebook.com', 'https://www.google.com'];
foreach ($urlArray as $k=>$v) {

    echo var_dump($urlVal->domain($v)) . ' URL: ' . $v . '<br>';

}

Output:

bool(false) URL: http://www.bokranzr.com/test.php?test=foo&test=dfdf
bool(true) URL: https://en-gb.facebook.com
bool(true) URL: https://www.google.com

As you can see above, www.bokranzr.com is listed as malicious website via an RBL so the domain was returned as false.

score 0 · Answer 19 · answered May 30 '09 at 05:11

Peter's Regex doesn't look right to me for many reasons. It allows all kinds of special characters in the domain name and doesn't test for much.

Frankie's function looks good to me and you can build a good regex from the components if you don't want a function, like so:

^(http://|https://)(([a-z0-9]([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z]{2,6}

Untested but I think that should work.

Also, Owen's answer doesn't look 100% either. I took the domain part of the regex and tested it on a Regex tester tool http://erik.eae.net/playground/regexp/regexp.html

I put the following line:

(\S*?\.\S*?)

in the "regexp" section and the following line:

-hello.com

under the "sample text" section.

The result allowed the minus character through. Because \S means any non-space character.

Note the regex from Frankie handles the minus because it has this part for the first character:

[a-z0-9]

Which won't allow the minus or any other special character.

score -1 · Answer 20 · answered Aug 05 '12 at 23:17

-1

I've found this to be the most useful for matching a URL..

^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$

answered Aug 05 '12 at 23:17

Jeremy Moore

9
1

1

Will that match URLs that begin with `ftp:` ? – andrewsi Sep 30 '12 at 20:27
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/ – Shahbaz Sep 26 '13 at 11:43

score -1 · Answer 21 · edited Jun 30 '15 at 13:43

-1

There is a PHP native function for that:

$url = 'http://www.yoururl.co.uk/sub1/sub2/?param=1&param2/';

if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) {
    // Wrong
}
else {
    // Valid
}

Returns the filtered data, or FALSE if the filter fails.

Check it here

edited Jun 30 '15 at 13:43

Ram G Athreya

4,892
6
25
57

answered May 14 '15 at 13:13

Fredmat

938
4
14
34

1

This answer duplicates one of the answers from 2008! – suspectus Jun 30 '15 at 12:13

PHP validation/regex for URL

21 Answers21

UriTest.php (Contains valid and invalid cases for both URLs and URNs)

Uri.php (Value Object)

Running UnitTests

Code coverage

Linked

Related