22

I want a solution to validate only domain names not full URLs, The following example is what I'm looking for:

example.com -> true
example.net -> true
example.org -> true
example.biz -> true
example.co.uk -> true
sub.example.com -> true
example.com/folder -> false
exam*$ple.com -> false
Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
CodeOverload
  • 47,274
  • 54
  • 131
  • 219
  • http://stackoverflow.com/questions/399932/can-i-improve-this-regex-check-for-valid-domain-names has lots more information about using regular expressions to match domain names. – Gavin Mogan Jun 12 '10 at 00:13

7 Answers7

95

The accepted answer is incomplete/wrong.

The regex pattern;

  • should NOT validate domains such as:
    -example.com, example--.com, -example-.-.com, example.000, etc...

  • should validate domains such as:
    schools.k12, newTLD.clothing, good.photography, etc...

After some further research; below is the most correct, cross-language and compact pattern I could come up with:

^(?!\-)(?:(?:[a-zA-Z\d][a-zA-Z\d\-]{0,61})?[a-zA-Z\d]\.){1,126}(?!\d+)[a-zA-Z\d]{1,63}$

This pattern conforms with most* of the rules defined in the specs:

  • Each label/level (splitted by a dot) may contain up to 63 characters.
  • The full domain name may have up to 127 levels.
  • The full domain name may not exceed the length of 253 characters in its textual representation.
  • Each label can consist of letters, digits and hyphens.
  • Labels cannot start or end with a hyphen.
  • The top-level domain (extension) cannot be all-numeric.

Note 1: The full domain length check is not included in the regex. It should be simply checked by native methods e.g. strlen(domain) <= 253.
Note 2: This pattern works with most languages including PHP, Javascript, Python, etc...

See DEMO here (for JS, PHP, Python)

More Info:

  • The regex above does not support IDNs.

  • There is no spec that says the extension (TLD) should be between 2 and 6 characters. It actually supports 63 characters. See the current TLD list here. Also, some networks do internally use custom/pseudo TLDs.

  • Registration authorities might impose some extra, specific rules which are not explicitly supported in this regex. For example, .CO.UK and .ORG.UK must have at least 3 characters, but less than 23, not including the extension. These kinds of rules are non-standard and subject to change. Do not implement them if you cannot maintain.

  • Regular Expressions are great but not the best effective, performant solution to every problem. So a native URL parser should be used instead, whenever possible. e.g. Python's urlparse() method or PHP's parse_url() method...

  • After all, this is just a format validation. A regex test does not confirm that a domain name is actually configured/exists! You should test the existence by making a request.

Specs & References:

UPDATE (2019-12-21): Fixed leading hyphen with subdomains.

Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
Onur Yıldırım
  • 32,327
  • 12
  • 84
  • 98
32

How about:

^(?:[-A-Za-z0-9]+\.)+[A-Za-z]{2,6}$
zildjohn01
  • 11,339
  • 6
  • 52
  • 58
  • Why the downvote? I tested it at http://regexpal.com/ and it matches all the OP's test data. – zildjohn01 Jun 12 '10 at 00:07
  • 6
    Whoever downvoted took it back. @Lauri `.museum` and `.travel`. – zildjohn01 Jun 12 '10 at 00:12
  • @zildjohn01 Learned something new today as well ;-) Thanks – Lauri Lehtinen Jun 12 '10 at 00:16
  • +1 from me. And for the record, depending on what is what you want to do with that regexp, keep in mind that example.{com,net,org} are reserved and can't be registered (cf. http://en.wikipedia.org/wiki/Example.com). – matias Jun 12 '10 at 01:36
  • 4
    This answer is (not fully wrong) but **incomplete**. See the correction in my answer. – Onur Yıldırım May 10 '13 at 21:34
  • This will work - however now that we're about to get a bunch of new gTLD's, some are 12+ characters long. For the sake of future proofing a bare minimum of 20 characters should be allowed. Even then you might eventually hit a brick wall. – Sk446 Nov 05 '13 at 11:23
  • RickM is correct; based on new vanity gTLDs, it would probably be safer to enforce the max possible limit of 63 characters (ref http://stackoverflow.com/questions/9238640/how-long-can-a-tld-possibly-be). – mway Apr 18 '14 at 19:53
  • Length, you have to keep it at 64 chars or less! – Indolering May 18 '14 at 20:00
  • 3
    It only works in PHP if you put REGEX as this: `/^(?:[-A-Za-z0-9]+\.)+[A-Za-z]{2,6}$/`. It works with `preg_match` function for example, otherwise not. – Villapalos Apr 29 '16 at 10:51
4

Please try this expression:

^(http[s]?\:\/\/)?((\w+)\.)?(([\w-]+)?)(\.[\w-]+){1,2}$

What it actually does

  • optional http/s://
  • optional www
  • any valid alphanumeric name (including - and _)
  • 1 or 2 occurrences of any valid alphanumeric name (including - and _)

Validation Examples

  • http://www.test.example
  • test.com.mt
Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
2

In my case, domain name is considered as valid if the format is stackoverflow.com or xxx.stackoverflow.com

So in addition to other stack answers, I have added checking for www. also.

function isValidDomainName($domain) {
  if (filter_var(gethostbyname($domain), FILTER_VALIDATE_IP)) {
      return (preg_match('/^www./', $domain)) ? FALSE : TRUE;
  }
  return FALSE;
}

you can test the function with this code

    $domain = array("http://www.domain.example","http://www.domain.example/folder" ,"http://domain.example", "www.domain.example", "domain.example/subfolder", "domain.example","sub.domain.example");
    foreach ($domain as $v) {
        echo isValidDomainName($v) ? "{$v} is valid<br>" : "{$v} is invalid<br>";
    }
Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
Web_Developer
  • 1,251
  • 2
  • 18
  • 34
0

Remember, regexes can only check to see if something is well formed. www.idonotexistbecauseiammadeuponthespot.example is well-formed, but doesn't actually exist... at the time of writing. ;) Furthermore, certain free web hosting providers (like Tripod) allow underscores in subdomains. This is clearly a violation of the RFCs, yet it sometimes works.

Do you want to check if the domain exists? Try dns_get_record instead of (just) a regex.

Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
Charles
  • 50,943
  • 13
  • 104
  • 142
0

I made a function to validate the domain name without any regex.

<?php
function validDomain($domain) {
  $domain = rtrim($domain, '.');
  if (!mb_stripos($domain, '.')) {
    return false;
  }
  $domain = explode('.', $domain);
  $allowedChars = array('-');
  $extenion = array_pop($domain);
  foreach ($domain as $value) {
    $fc = mb_substr($value, 0, 1);
    $lc = mb_substr($value, -1);
    if (
      hash_equals($value, '')
      || in_array($fc, $allowedChars)
      || in_array($lc, $allowedChars)
    ) {
      return false;
    }
    if (!ctype_alnum(str_replace($allowedChars, '', $value))) {
      return false;
    }
  }
  if (
    !ctype_alnum(str_replace($allowedChars, '', $extenion))
    || hash_equals($extenion, '')
  ) {
    return false;
  }
  return true;
}
$testCases = array(
  'a',
  '0',
  'a.b',
  'google.com',
  'news.google.co.uk',
  'xn--fsqu00a.xn--0zwm56d',
  'google.com ',
  'google.com.',
  'goo gle.com',
  'a.',
  'hey.hey',
  'google-.com',
  '-nj--9*.vom',
  ' ',
  '..',
  'google..com',
  'www.google.com',
  'www.google.com/some/path/to/dir/'
);
foreach ($testCases as $testCase) {
  var_dump($testCase);
  var_dump(validDomain($TestCase));
  echo '<br /><br />';
}
?>

This code outputs:

string(1) "a" bool(false)

string(1) "0" bool(false)

string(3) "a.b" bool(true)

string(10) "google.com" bool(true)

string(17) "news.google.co.uk" bool(true)

string(23) "xn--fsqu00a.xn--0zwm56d" bool(true)

string(11) "google.com " bool(false)

string(11) "google.com." bool(true)

string(11) "goo gle.com" bool(false)

string(2) "a." bool(false)

string(7) "hey.hey" bool(true)

string(11) "google-.com" bool(false)

string(11) "-nj--9*.vom" bool(false)

string(1) " " bool(false)

string(2) ".." bool(false)

string(11) "google..com" bool(false)

string(14) "www.google.com" bool(true)

string(32) "www.google.com/some/path/to/dir/" bool(false)

I hope I have covered everything if I missed something please tell me and I can improve this function. :)

Amplifier
  • 143
  • 1
  • 1
  • 11
0

Time ago i was over to discover if i were able to find the solution, but checking for all possibilities, and due to subdomains possible values, i've been may wrongly convinced (maybe) that the unique way to know if something is valid or not, were to check against this array (that can be extracted as in the example by icann site)

http://www.axew3.com/www/data-hints/w3-all-top-level-domains-names-array.php

with something like this:

// this extract ever the correct cookie domain (except for sub hosted/domains like: mydomain.my-hostingService-domain.com)

function extract_cookie_domain( $w3cookie_domain ) {

require_once( WPW3ALL_PLUGIN_DIR . 'addons/w3_icann_domains.php' );

$count_dot = substr_count($w3cookie_domain, ".");

     if($count_dot >= 3){
      preg_match('/.*(\.)([-a-z0-9]+)(\.[-a-z0-9]+)(\.[a-z]+)/', $w3cookie_domain, $w3m0, PREG_OFFSET_CAPTURE);
      $w3cookie_domain = $w3m0[2][0].$w3m0[3][0].$w3m0[4][0];
   }
   
   $ckcd = explode('.',$w3cookie_domain);
// $w3all_domains array come from file inclusion where icann domains are stored- This is the unique way to check if a domain is valid and to complete any answer, or any answer, will be incomplete
  if(!in_array('.'.$ckcd[1], $w3all_domains)){
   $w3cookie_domain = preg_replace('/^[^\.]*\.([^\.]*)\.(.*)$/', '\1.\2', $w3cookie_domain);
  }

    $w3cookie_domain = '.' . $w3cookie_domain;

$pos = strpos($w3cookie_domain, '.');
if($pos != 0){
    $w3cookie_domain = '.' . $w3cookie_domain;
}

return $w3cookie_domain;

}

but maybe i'm wrong. What you say?

p.s i did not re-checked the logic of the function, may it can be shortened and surely improved

$w3all_domains array come from file inclusion where icann domains are stored- This is the unique way to check if a domain is valid and to complete any answer: or i think that any solution above or below, will be sometime incomplete.

[EDITED]

axew3
  • 131
  • 2
  • 9
  • This question is asking for "validation", but your answer seems to be demonstrating an "extraction" process. Are you answering the asked question? – mickmackusa Jan 03 '22 at 13:44
  • well, it is the unique way to get the right result, or you'll be never sure, if you before, do not pass through something like this. Isn't it? – axew3 Jan 03 '22 at 13:45