I've been looking for a simple regex for URLs, does anybody have one handy that works well? I didn't find one with the zend framework validation classes and have seen several implementations.
21 Answers
Use the filter_var()
function to validate whether a string is URL or not:
var_dump(filter_var('example.com', FILTER_VALIDATE_URL));
It is bad practice to use regular expressions when not necessary.
EDIT: Be careful, this solution is not unicode-safe and not XSS-safe. If you need a complex validation, maybe it's better to look somewhere else.

- 2,683
- 2
- 18
- 14
-
1this is definitely a great alternative, unfortunately it's php 5.2+ (unless you install the PECL version) – Owen Oct 19 '08 at 08:07
-
29There's a bug in 5.2.13 (and I think 5.3.2) that prevents urls with dashes in them from validating using this method. – vamin Jun 01 '10 at 23:27
-
15filter_var will reject http://test-site.com, I have domain names with dashes, wheter they are valid or not. I don't think filter_var is the best way to validate a url. It will allow a url like `http://www` – Cesar Sep 06 '10 at 19:30
-
4> It will allow a url like 'http://www' It is OK when URL like 'http://localhost' – Stanislav Sep 07 '10 at 10:34
-
One particular problem: This validates URLs according to RFC 2396 which does not allow underscores in subdomains, but some websites do have underscores in subdomains. – liviucmg Mar 29 '11 at 17:43
-
12
-
1The `filter_var` function has since been updated and now it's possible to validate URLs effectively with dashes included, rendering the your comment incorrect, @vamin ([see bug report here](https://bugs.php.net/bug.php?id=51192)). – Zack Zatkin-Gold Jan 12 '12 at 02:04
-
@zzatkin, the bug report states that the fix is incorporated into the later 5.2.14 and 5.3.3 versions (it came too late for 5.2.13 and 5.3.2), though I agree it's not really an issue anymore so long as you keep PHP up to date. – vamin Jan 23 '12 at 18:12
-
It also will validate http://www.onedomain.com
http://www.anotherone.com
http://www.yetanother.com I'm finding out today. Not what I had in mind! Going back to a regular expression alternative (PHP Version => 5.4.4) – Bretticus Nov 19 '12 at 19:31 -
Dosen't accept UTF-8 characters. Will return false for `http://wiki.com/öva/mä/åäö`. – Sawny Dec 16 '12 at 19:06
-
The filter_var appears to validate all different kinds of URL formats whether they are valid or not, it seems that the regex is the way to correctly validate URL's – mic Sep 30 '13 at 09:35
-
yet another issue is that it does not validate against newer tlds like .me, .cm .guru etc – bhaskarc Mar 15 '15 at 17:59
-
This is a bad solution which should not have so many up votes. Highly XSS vulnerable. – RisingSun May 04 '15 at 18:46
-
1Downvoted as dangerous. Read the comments about it the online PHP manual! – Nick Rice Sep 12 '16 at 11:09
-
3FILTER_VALIDATE_URL has [a lot of problems](https://bugs.php.net/search.php?cmd=display&search_for=FILTER_VALIDATE_URL) that need fixing. Also, the [docs describing the flags](http://php.net/manual/en/filter.filters.validate.php) do not reflect the [actual source code](https://github.com/php/php-src/blob/master/ext/filter/logical_filters.c#L517) where references to some flags have been removed entirely. More info here: http://news.php.net/php.internals/99018 – S. Imp May 12 '17 at 21:53
-
Hree's another article explaining the problems with this: https://d-mueller.de/blog/why-url-validation-with-filter_var-might-not-be-a-good-idea/ – thespacecamel Aug 31 '18 at 18:31
-
it is a bad solution, 'cause `a://site.com` is valid for FILTER_VALIDATE_URL (PHP 7.2 and older versions) – Karel Wintersky Jul 21 '20 at 11:29
I used this on a few projects, I don't believe I've run into issues, but I'm sure it's not exhaustive:
$text = preg_replace(
'#((https?|ftp)://(\S*?\.\S*?))([\s)\[\]{},;"\':<]|\.\s|$)#i',
"'<a href=\"$1\" target=\"_blank\">$3</a>$4'",
$text
);
Most of the random junk at the end is to deal with situations like http://domain.example.
in a sentence (to avoid matching the trailing period). I'm sure it could be cleaned up but since it worked. I've more or less just copied it over from project to project.

- 23,933
- 14
- 88
- 109

- 82,995
- 21
- 120
- 115
-
7Some things that jump out at me: use of alternation where character classes are called for (every alternative matches exactly one character); and the replacement shouldn't have needed the outer double-quotes (they were only needed because of the pointless /e modifier on the regex). – Alan Moore May 30 '09 at 05:53
-
1@John Scipione: `google.com` is only a valid relative URL path but not a valid absolute URL. And I think that’s what he’s looking for. – Gumbo Jan 04 '10 at 08:30
-
This doesn't work in this case - it includes the trailing ": 3 cantari noi in albumul Diverse – Softy Feb 02 '11 at 09:06
-
1@Softy something like `http://example.com/somedir/...` is a perfectly legitimate URL, asking for the file named `...` - which is a legitimate file name. – Stephen P Jul 27 '11 at 23:55
-
I'm using Zend\Validator\Regex to validate url using your pattern, but it still detect `http://www.example` to be valid – Joko Wandiro Nov 26 '13 at 08:03
As per the PHP manual - parse_url should not be used to validate a URL.
Unfortunately, it seems that filter_var('example.com', FILTER_VALIDATE_URL)
does not perform any better.
Both parse_url()
and filter_var()
will pass malformed URLs such as http://...
Therefore in this case - regex is the better method.

- 9,053
- 2
- 27
- 16
-
11This argument doesn't follow. If FILTER_VALIDATE_URL is a little more permissive than you want, tack on some additional checks to deal with those edge cases. Reinventing the wheel with your own attempt at a regex against urls is only going to get you further from a complete check. – Kzqai Jul 19 '10 at 00:50
-
2See all the shot-down regexes on this page for examples of why -not- to write your own. – Kzqai Jul 19 '10 at 02:54
-
3You make a fair point Tchalvak. Regexes for something like URLs can (as per other responses) be very hard to get right. Regex is not always the answer. Conversely regex is also not always the wrong answer either. The important point is to pick the right tool (regex or otherwise) for the job and not be specifically "anti" or "pro" regex. In hindsight, your answer of using filter_var in combination with constraints on its edge-cases, looks like the better answer (particularly when regex answers start to get to greater than 100 chars or so - making maintenance of said regex a nightmare) – catchdave Jul 20 '10 at 04:54
-
1. `filter_var()` seems to not allow “malformed URLs such as `http://...`“. (Well, it might allow it in 2008…) In my current tests, it behaves better than suggested regexes. 2. As this answer hasn’t included an actual regex, it is not useful. – Melebius Feb 17 '23 at 08:37
As per John Gruber (Daring Fireball):
Regex:
(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))
using in preg_match():
preg_match("/(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $url)
Here is the extended regex pattern (with comments):
(?xi)
\b
( # Capture 1: entire matched URL
(?:
https?:// # http or https protocol
| # or
www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
| # or
[a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
)
(?: # One or more:
[^\s()<>]+ # Run of non-space, non-()<>
| # or
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
)+
(?: # End with:
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
| # or
[^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars
)
)
For more details please look at: http://daringfireball.net/2010/07/improved_regex_for_matching_urls

- 4,548
- 7
- 29
- 24
-
1To work, the pattern needs to escape the forward slashes with backslashes in three points: preg_match("/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $url) – Ben Birney Oct 03 '20 at 09:45
Just in case you want to know if the url really exists:
function url_exist($url){//se passar a URL existe
$c=curl_init();
curl_setopt($c,CURLOPT_URL,$url);
curl_setopt($c,CURLOPT_HEADER,1);//get the header
curl_setopt($c,CURLOPT_NOBODY,1);//and *only* get the header
curl_setopt($c,CURLOPT_RETURNTRANSFER,1);//get the response as a string from curl_exec(), rather than echoing it
curl_setopt($c,CURLOPT_FRESH_CONNECT,1);//don't use a cached version of the url
if(!curl_exec($c)){
//echo $url.' inexists';
return false;
}else{
//echo $url.' exists';
return true;
}
//$httpcode=curl_getinfo($c,CURLINFO_HTTP_CODE);
//return ($httpcode<400);
}

- 8,286
- 17
- 59
- 77
-
1I would still do some kind of validation on `$url` before actually verifying the url is real because the above operation is expensive - perhaps as much as 200 milliseconds depending on file size. In some cases the url may not actually have a resource at its location available yet (e.g. creating a url to an image that has yet to be uploaded). Additionally you're not using a cached version so its not like `file_exists()` that will cache a stat on a file and return nearly instantly. The solution you provided is still useful though. Why not just use `fopen($url, 'r')`? – Yzmir Ramirez Aug 06 '11 at 18:14
-
Thanks, just what I was looking for. However, I made a mistake trying to use it. The function is "url_exist" not "url_exists" oops ;-) – PJ Brunet Mar 20 '12 at 20:24
-
9Is there any security risk in directly accessing the user entered URL? – siliconpi May 10 '12 at 07:14
-
you would like to add a check if a 404 was found:
$httpCode = curl_getinfo( $c, CURLINFO_HTTP_CODE ); //echo $url . ' ' . $httpCode . '
– Camaleo Mar 12 '18 at 13:28
'; if( $httpCode == 404 ) { echo $url.' 404'; } -
I don't think that using regular expressions is a smart thing to do in this case. It is impossible to match all of the possibilities and even if you did, there is still a chance that url simply doesn't exist.
Here is a very simple way to test if url actually exists and is readable :
if (preg_match("#^https?://.+#", $link) and @fopen($link,"r")) echo "OK";
(if there is no preg_match
then this would also validate all filenames on your server)
I've used this one with good success - I don't remember where I got it from
$pattern = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";

- 105,256
- 31
- 182
- 206
-
^(http://|https://)?(([a-z0-9]?([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z]{2,6} (may be too greedy, not sure yet, but it's more flexible on protocol and leading www) – andrewbadera Aug 26 '09 at 15:54
The best URL Regex that worked for me:
function valid_URL($url){
return preg_match('%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu', $url);
}
Examples:
valid_URL('https://twitter.com'); // true
valid_URL('http://twitter.com'); // true
valid_URL('http://twitter.co'); // true
valid_URL('http://t.co'); // true
valid_URL('http://twitter.c'); // false
valid_URL('htt://twitter.com'); // false
valid_URL('http://example.com/?a=1&b=2&c=3'); // true
valid_URL('http://127.0.0.1'); // true
valid_URL(''); // false
valid_URL(1); // false
Source: http://urlregex.com/

- 662
- 8
- 9
function validateURL($URL) {
$pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
$pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){
return true;
} else{
return false;
}
}

- 222
- 3
- 5
And there is your answer =) Try to break it, you can't!!!
function link_validate_url($text) {
$LINK_DOMAINS = 'aero|arpa|asia|biz|com|cat|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|mobi|local';
$LINK_ICHARS_DOMAIN = (string) html_entity_decode(implode("", array( // @TODO completing letters ...
"æ", // æ
"Æ", // Æ
"À", // À
"à", // à
"Á", // Á
"á", // á
"Â", // Â
"â", // â
"å", // å
"Å", // Å
"ä", // ä
"Ä", // Ä
"Ç", // Ç
"ç", // ç
"Ð", // Ð
"ð", // ð
"È", // È
"è", // è
"É", // É
"é", // é
"Ê", // Ê
"ê", // ê
"Ë", // Ë
"ë", // ë
"Î", // Î
"î", // î
"Ï", // Ï
"ï", // ï
"ø", // ø
"Ø", // Ø
"ö", // ö
"Ö", // Ö
"Ô", // Ô
"ô", // ô
"Õ", // Õ
"õ", // õ
"Œ", // Œ
"œ", // œ
"ü", // ü
"Ü", // Ü
"Ù", // Ù
"ù", // ù
"Û", // Û
"û", // û
"Ÿ", // Ÿ
"ÿ", // ÿ
"Ñ", // Ñ
"ñ", // ñ
"þ", // þ
"Þ", // Þ
"ý", // ý
"Ý", // Ý
"¿", // ¿
)), ENT_QUOTES, 'UTF-8');
$LINK_ICHARS = $LINK_ICHARS_DOMAIN . (string) html_entity_decode(implode("", array(
"ß", // ß
)), ENT_QUOTES, 'UTF-8');
$allowed_protocols = array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal');
// Starting a parenthesis group with (?: means that it is grouped, but is not captured
$protocol = '((?:'. implode("|", $allowed_protocols) .'):\/\/)';
$authentication = "(?:(?:(?:[\w\.\-\+!$&'\(\)*\+,;=" . $LINK_ICHARS . "]|%[0-9a-f]{2})+(?::(?:[\w". $LINK_ICHARS ."\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})*)?)?@)";
$domain = '(?:(?:[a-z0-9' . $LINK_ICHARS_DOMAIN . ']([a-z0-9'. $LINK_ICHARS_DOMAIN . '\-_\[\]])*)(\.(([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_\[\]])+\.)*('. $LINK_DOMAINS .'|[a-z]{2}))?)';
$ipv4 = '(?:[0-9]{1,3}(\.[0-9]{1,3}){3})';
$ipv6 = '(?:[0-9a-fA-F]{1,4}(\:[0-9a-fA-F]{1,4}){7})';
$port = '(?::([0-9]{1,5}))';
// Pattern specific to external links.
$external_pattern = '/^'. $protocol .'?'. $authentication .'?('. $domain .'|'. $ipv4 .'|'. $ipv6 .' |localhost)'. $port .'?';
// Pattern specific to internal links.
$internal_pattern = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]]+)";
$internal_pattern_file = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]\.]+)$/i";
$directories = "(?:\/[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'#!():;*@\[\]]*)*";
// Yes, four backslashes == a single backslash.
$query = "(?:\/?\?([?a-z0-9". $LINK_ICHARS ."+_|\-\.~\/\\\\%=&,$'():;*@\[\]{} ]*))";
$anchor = "(?:#[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'():;*@\[\]\/\?]*)";
// The rest of the path for a standard URL.
$end = $directories .'?'. $query .'?'. $anchor .'?'.'$/i';
$message_id = '[^@].*@'. $domain;
$newsgroup_name = '(?:[0-9a-z+-]*\.)*[0-9a-z+-]*';
$news_pattern = '/^news:('. $newsgroup_name .'|'. $message_id .')$/i';
$user = '[a-zA-Z0-9'. $LINK_ICHARS .'_\-\.\+\^!#\$%&*+\/\=\?\`\|\{\}~\'\[\]]+';
$email_pattern = '/^mailto:'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/';
if (strpos($text, '<front>') === 0) {
return false;
}
if (in_array('mailto', $allowed_protocols) && preg_match($email_pattern, $text)) {
return false;
}
if (in_array('news', $allowed_protocols) && preg_match($news_pattern, $text)) {
return false;
}
if (preg_match($internal_pattern . $end, $text)) {
return false;
}
if (preg_match($external_pattern . $end, $text)) {
return false;
}
if (preg_match($internal_pattern_file, $text)) {
return false;
}
return true;
}

- 26,951
- 10
- 71
- 101

- 553
- 7
- 22
-
There are a lot more [top level domains](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains). – Jeff Puckett Sep 26 '16 at 20:17
-
Your `.`, `?`, `+`, `^`, `{`, `}`, `=`, `|`, `$`, backtick, and `[` do not need escaping in your character classes. `+` is even repeated in one of your character classes. `:` does not need to be escaped. – mickmackusa Sep 27 '21 at 10:26
Edit:
As incidence pointed out this code has been DEPRECATED with the release of PHP 5.3.0 (2009-06-30) and should be used accordingly.
Just my two cents but I've developed this function and have been using it for a while with success. It's well documented and separated so you can easily change it.
// Checks if string is a URL
// @param string $url
// @return bool
function isURL($url = NULL) {
if($url==NULL) return false;
$protocol = '(http://|https://)';
$allowed = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)';
$regex = "^". $protocol . // must include the protocol
'(' . $allowed . '{1,63}\.)+'. // 1 or several sub domains with a max of 63 chars
'[a-z]' . '{2,6}'; // followed by a TLD
if(eregi($regex, $url)==true) return true;
else return false;
}
-
1Eregi will be removed in PHP 6.0.0. And domains with "öäåø" will not validate with your function. You probably should convert the URL to punycode first? – Dec 10 '09 at 15:48
-
@incidence absolutely agree. I wrote this in March and PHP 5.3 only came out late June setting eregi as DEPRECATED. Thank you. Gonna edit and update. – Frankie Dec 10 '09 at 18:05
-
Correct me if I'm wrong, but can we still assume TLDs will have a minimum of 2 characters and maximum of 6 characters? – Yzmir Ramirez Aug 06 '11 at 18:15
-
2@YzmirRamirez (All these years later...) If there was any doubt when you wrote your comment there certainly isn't now, with TLDs these days such as .photography – Nick Rice Sep 12 '16 at 11:02
-
@NickRice you are correct...how much the web changes in 5 years. Now I can't wait until someone makes the TLD .supercalifragilisticexpialidocious – Yzmir Ramirez Sep 13 '16 at 17:03
function is_valid_url ($url="") {
if ($url=="") {
$url=$this->url;
}
$url = @parse_url($url);
if ( ! $url) {
return false;
}
$url = array_map('trim', $url);
$url['port'] = (!isset($url['port'])) ? 80 : (int)$url['port'];
$path = (isset($url['path'])) ? $url['path'] : '';
if ($path == '') {
$path = '/';
}
$path .= ( isset ( $url['query'] ) ) ? "?$url[query]" : '';
if ( isset ( $url['host'] ) AND $url['host'] != gethostbyname ( $url['host'] ) ) {
if ( PHP_VERSION >= 5 ) {
$headers = get_headers("$url[scheme]://$url[host]:$url[port]$path");
}
else {
$fp = fsockopen($url['host'], $url['port'], $errno, $errstr, 30);
if ( ! $fp ) {
return false;
}
fputs($fp, "HEAD $path HTTP/1.1\r\nHost: $url[host]\r\n\r\n");
$headers = fread ( $fp, 128 );
fclose ( $fp );
}
$headers = ( is_array ( $headers ) ) ? implode ( "\n", $headers ) : $headers;
return ( bool ) preg_match ( '#^HTTP/.*\s+[(200|301|302)]+\s#i', $headers );
}
return false;
}

- 11,155
- 36
- 98
- 169
-
Hi this solution is good, and i upvoted it, but it doesn't take into account the standard port for https: -- suggest you just replace 80 with '' where it works out the port – pgee70 Sep 28 '14 at 21:41
-
I ended up implementing a variation on this, because my domain cares whether an URL actually exists or not :) – Raz0rwire Jul 18 '16 at 13:34
Inspired in this .NET StackOverflow question and in this referenced article from that question there is this URI validator (URI means it validates both URL and URN).
if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
{
throw new \RuntimeException( "URI has not a valid format." );
}
I have successfully unit-tested this function inside a ValueObject I made named Uri
and tested by UriTest
.
UriTest.php (Contains valid and invalid cases for both URLs and URNs)
<?php
declare( strict_types = 1 );
namespace XaviMontero\ThrasherPortage\Tests\Tour;
use XaviMontero\ThrasherPortage\Tour\Uri;
class UriTest extends \PHPUnit_Framework_TestCase
{
private $sut;
public function testCreationIsOfProperClassWhenUriIsValid()
{
$sut = new Uri( 'http://example.com' );
$this->assertInstanceOf( 'XaviMontero\\ThrasherPortage\\Tour\\Uri', $sut );
}
/**
* @dataProvider urlIsValidProvider
* @dataProvider urnIsValidProvider
*/
public function testGetUriAsStringWhenUriIsValid( string $uri )
{
$sut = new Uri( $uri );
$actual = $sut->getUriAsString();
$this->assertInternalType( 'string', $actual );
$this->assertEquals( $uri, $actual );
}
public function urlIsValidProvider()
{
return
[
[ 'http://example-server' ],
[ 'http://example.com' ],
[ 'http://example.com/' ],
[ 'http://subdomain.example.com/path/?parameter1=value1¶meter2=value2' ],
[ 'random-protocol://example.com' ],
[ 'http://example.com:80' ],
[ 'http://example.com?no-path-separator' ],
[ 'http://example.com/pa%20th/' ],
[ 'ftp://example.org/resource.txt' ],
[ 'file://../../../relative/path/needs/protocol/resource.txt' ],
[ 'http://example.com/#one-fragment' ],
[ 'http://example.edu:8080#one-fragment' ],
];
}
public function urnIsValidProvider()
{
return
[
[ 'urn:isbn:0-486-27557-4' ],
[ 'urn:example:mammal:monotreme:echidna' ],
[ 'urn:mpeg:mpeg7:schema:2001' ],
[ 'urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
[ 'rare-urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
[ 'urn:FOO:a123,456' ]
];
}
/**
* @dataProvider urlIsNotValidProvider
* @dataProvider urnIsNotValidProvider
*/
public function testCreationThrowsExceptionWhenUriIsNotValid( string $uri )
{
$this->expectException( 'RuntimeException' );
$this->sut = new Uri( $uri );
}
public function urlIsNotValidProvider()
{
return
[
[ 'only-text' ],
[ 'http//missing.colon.example.com/path/?parameter1=value1¶meter2=value2' ],
[ 'missing.protocol.example.com/path/' ],
[ 'http://example.com\\bad-separator' ],
[ 'http://example.com|bad-separator' ],
[ 'ht tp://example.com' ],
[ 'http://exampl e.com' ],
[ 'http://example.com/pa th/' ],
[ '../../../relative/path/needs/protocol/resource.txt' ],
[ 'http://example.com/#two-fragments#not-allowed' ],
[ 'http://example.edu:portMustBeANumber#one-fragment' ],
];
}
public function urnIsNotValidProvider()
{
return
[
[ 'urn:mpeg:mpeg7:sch ema:2001' ],
[ 'urn|mpeg:mpeg7:schema:2001' ],
[ 'urn?mpeg:mpeg7:schema:2001' ],
[ 'urn%mpeg:mpeg7:schema:2001' ],
[ 'urn#mpeg:mpeg7:schema:2001' ],
];
}
}
Uri.php (Value Object)
<?php
declare( strict_types = 1 );
namespace XaviMontero\ThrasherPortage\Tour;
class Uri
{
/** @var string */
private $uri;
public function __construct( string $uri )
{
$this->assertUriIsCorrect( $uri );
$this->uri = $uri;
}
public function getUriAsString()
{
return $this->uri;
}
private function assertUriIsCorrect( string $uri )
{
// https://stackoverflow.com/questions/30847/regex-to-validate-uris
// http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/
if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
{
throw new \RuntimeException( "URI has not a valid format." );
}
}
}
Running UnitTests
There are 65 assertions in 46 tests. Caution: there are 2 data-providers for valid and 2 more for invalid expressions. One is for URLs and the other for URNs. If you are using a version of PhpUnit of v5.6* or earlier then you need to join the two data providers into a single one.
xavi@bromo:~/custom_www/hello-trip/mutant-migrant$ vendor/bin/phpunit
PHPUnit 5.7.3 by Sebastian Bergmann and contributors.
.............................................. 46 / 46 (100%)
Time: 82 ms, Memory: 4.00MB
OK (46 tests, 65 assertions)
Code coverage
There's is 100% of code-coverage in this sample URI checker.

- 1
- 1

- 9,239
- 7
- 57
- 79
"/(http(s?):\/\/)([a-z0-9\-]+\.)+[a-z]{2,4}(\.[a-z]{2,4})*(\/[^ ]+)*/i"
(http(s?)://) means http:// or https://
([a-z0-9-]+.)+ => 2.0[a-z0-9-] means any a-z character or any 0-9 or (-)sign)
2.1 (+) means the character can be one or more ex: a1w, a9-,c559s, f) 2.2 \. is (.)sign 2.3. the (+) sign after ([a-z0-9\-]+\.) mean do 2.1,2.2,2.3 at least 1 time ex: abc.defgh0.ig, aa.b.ced.f.gh. also in case www.yyy.com 3.[a-z]{2,4} mean a-z at least 2 character but not more than 4 characters for check that there will not be the case ex: https://www.google.co.kr.asdsdagfsdfsf 4.(\.[a-z]{2,4})*(\/[^ ]+)* mean 4.1 \.[a-z]{2,4} means like number 3 but start with (.)sign 4.2 * means (\.[a-z]{2,4})can be use or not use never mind 4.3 \/ means \ 4.4 [^ ] means any character except blank 4.5 (+) means do 4.3,4.4,4.5 at least 1 times 4.6 (*) after (\/[^ ]+) mean use 4.3 - 4.5 or not use no problem use for case https://stackoverflow.com/posts/51441301/edit 5. when you use regex write in "/ /" so it come
"/(http(s?)://)([a-z0-9-]+.)+[a-z]{2,4}(.[a-z]{2,4})(/[^ ]+)/i"
6. almost forgot: letter i on the back mean ignore case of Big letter or small letter ex: A same as a, SoRRy same as sorry.
Note : Sorry for bad English. My country not use it well.

- 91
- 4
-
4Did you notice how old this question is? Please explain your regex, users who do not know already will have a hard time understanding it without details. – Nic3500 Jul 20 '18 at 11:41
OK, so this is a little bit more complex then a simple regex, but it allows for different types of urls.
Examples:
- google.com
- www.microsoft.com/
- http://www.yahoo.com/
- https://www.bandcamp.com/artist/#!someone-special!
All which should be marked as valid.
function is_valid_url($url) {
// First check: is the url just a domain name? (allow a slash at the end)
$_domain_regex = "|^[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})/?$|";
if (preg_match($_domain_regex, $url)) {
return true;
}
// Second: Check if it's a url with a scheme and all
$_regex = '#^([a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))$#';
if (preg_match($_regex, $url, $matches)) {
// pull out the domain name, and make sure that the domain is valid.
$_parts = parse_url($url);
if (!in_array($_parts['scheme'], array( 'http', 'https' )))
return false;
// Check the domain using the regex, stops domains like "-example.com" passing through
if (!preg_match($_domain_regex, $_parts['host']))
return false;
// This domain looks pretty valid. Only way to check it now is to download it!
return true;
}
return false;
}
Note that there is a in_array check for the protocols that you want to allow (currently only http and https are in that list).
var_dump(is_valid_url('google.com')); // true
var_dump(is_valid_url('google.com/')); // true
var_dump(is_valid_url('http://google.com')); // true
var_dump(is_valid_url('http://google.com/')); // true
var_dump(is_valid_url('https://google.com')); // true

- 8,739
- 3
- 44
- 60
-
Throws: ErrorException: Undefined index: scheme if the protocol is not specified i suggest to check if is set before. – user3396065 Nov 20 '16 at 15:34
-
@user3396065, can you please provide an example input that throws this? – Tim Groeneveld Nov 28 '16 at 01:31
For anyone developing with WordPress, just use
esc_url_raw($url) === $url
to validate a URL (here's WordPress' documentation on esc_url_raw
). It handles URLs much better than filter_var($url, FILTER_VALIDATE_URL)
because it is unicode and XSS-safe. (Here is a good article mentioning all the problems with filter_var
).

- 912
- 11
- 14
Here is the way I did it. But I want to mentoin that I am not so shure about the regex. But It should work thou :)
$pattern = "#((http|https)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|”|\"|'|:|\<|$|\.\s)#i";
$text = preg_replace_callback($pattern,function($m){
return "<a href=\"$m[1]\" target=\"_blank\">$m[1]</a>$m[4]";
},
$text);
This way you won't need the eval marker on your pattern.
Hope it helps :)

- 3,500
- 4
- 34
- 43
-
`(http|https)` is more simply `https?`. The excessive use of pipes in this pattern negative impacts readability and brevity. Many of the escaped characters in your pattern do not need escaping. – mickmackusa Sep 27 '21 at 10:30
Here's a simple class for URL Validation using RegEx and then cross-references the domain against popular RBL (Realtime Blackhole Lists) servers:
Install:
require 'URLValidation.php';
Usage:
require 'URLValidation.php';
$urlVal = new UrlValidation(); //Create Object Instance
Add a URL as the parameter of the domain()
method and check the the return.
$urlArray = ['http://www.bokranzr.com/test.php?test=foo&test=dfdf', 'https://en-gb.facebook.com', 'https://www.google.com'];
foreach ($urlArray as $k=>$v) {
echo var_dump($urlVal->domain($v)) . ' URL: ' . $v . '<br>';
}
Output:
bool(false) URL: http://www.bokranzr.com/test.php?test=foo&test=dfdf
bool(true) URL: https://en-gb.facebook.com
bool(true) URL: https://www.google.com
As you can see above, www.bokranzr.com is listed as malicious website via an RBL so the domain was returned as false.

- 2,889
- 5
- 22
- 37
Peter's Regex doesn't look right to me for many reasons. It allows all kinds of special characters in the domain name and doesn't test for much.
Frankie's function looks good to me and you can build a good regex from the components if you don't want a function, like so:
^(http://|https://)(([a-z0-9]([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z]{2,6}
Untested but I think that should work.
Also, Owen's answer doesn't look 100% either. I took the domain part of the regex and tested it on a Regex tester tool http://erik.eae.net/playground/regexp/regexp.html
I put the following line:
(\S*?\.\S*?)
in the "regexp" section and the following line:
-hello.com
under the "sample text" section.
The result allowed the minus character through. Because \S means any non-space character.
Note the regex from Frankie handles the minus because it has this part for the first character:
[a-z0-9]
Which won't allow the minus or any other special character.

- 2,649
- 4
- 28
- 43
I've found this to be the most useful for matching a URL..
^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$

- 9
- 1
There is a PHP native function for that:
$url = 'http://www.yoururl.co.uk/sub1/sub2/?param=1¶m2/';
if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) {
// Wrong
}
else {
// Valid
}
Returns the filtered data, or FALSE if the filter fails.

- 4,892
- 6
- 25
- 57

- 938
- 4
- 14
- 34