41

Is it possible without using regular expression?

For example, I want to check that a string is a valid domain:

domain-name
abcd
example

Are valid domains. These are invalid of course:

domaia@name
ab$%cd

And so on. So basically it should start with an alphanumeric character, then there may be more alnum characters plus also a hyphen. And it must end with an alnum character, too.

If it's not possible, could you suggest me a regexp pattern to do this?

EDIT:

Why doesn't this work? Am I using preg_match incorrectly?

$domain = '@djkal';
$regexp = '/^[a-zA-Z0-9][a-zA-Z0-9\-\_]+[a-zA-Z0-9]$/';
if (false === preg_match($regexp, $domain)) {
    throw new Exception('Domain invalid');
}
kender
  • 85,663
  • 26
  • 103
  • 145
Richard Knop
  • 81,041
  • 149
  • 392
  • 552

20 Answers20

145
<?php
function is_valid_domain_name($domain_name)
{
    return (preg_match("/^([a-z\d](-*[a-z\d])*)(\.([a-z\d](-*[a-z\d])*))*$/i", $domain_name) //valid chars check
            && preg_match("/^.{1,253}$/", $domain_name) //overall length check
            && preg_match("/^[^\.]{1,63}(\.[^\.]{1,63})*$/", $domain_name)   ); //length of each label
}
?>

Test cases:

is_valid_domain_name? [a]                       Y
is_valid_domain_name? [0]                       Y
is_valid_domain_name? [a.b]                     Y
is_valid_domain_name? [localhost]               Y
is_valid_domain_name? [google.com]              Y
is_valid_domain_name? [news.google.co.uk]       Y
is_valid_domain_name? [xn--fsqu00a.xn--0zwm56d] Y
is_valid_domain_name? [goo gle.com]             N
is_valid_domain_name? [google..com]             N
is_valid_domain_name? [google.com ]             N
is_valid_domain_name? [google-.com]             N
is_valid_domain_name? [.google.com]             N
is_valid_domain_name? [<script]                 N
is_valid_domain_name? [alert(]                  N
is_valid_domain_name? [.]                       N
is_valid_domain_name? [..]                      N
is_valid_domain_name? [ ]                       N
is_valid_domain_name? [-]                       N
is_valid_domain_name? []                        N
velcrow
  • 6,336
  • 4
  • 29
  • 21
  • 7
    Don't forget to check if (count($pieces) > 1) – 472084 Jun 15 '12 at 14:28
  • Shorter single regex: `^[a-z\d](-*[a-z\d])*$`. – Kendall Hopkins Jul 17 '12 at 18:08
  • 1
    Kendall, thanks for the regex. Also, limited now to 253 due to: http://blog.sacaluta.com/2011/12/dns-domain-names-253-or-255-bytesoctets.html – velcrow Feb 20 '13 at 23:02
  • be careful with this regex. It will allow `xn---gnter--o2a.de` which is translated back to `-günter-.de` which obviously shouldn't be allowed. – lifeofguenter Mar 06 '15 at 08:48
  • /^([a-z\d](-*[a-z\d])*)(\.([a-z\d](-*[a-z\d])*))*$/i should actually be "/^([a-z\d](-*[a-z\d])*)(\.([a-z\d](-*[a-z\d])*))+$/i" otherwise you're finding none domains and just strings. You need to change the " * " to "+" – tmarois Jun 23 '15 at 20:12
  • 4
    As a rule of thumb you should use single quotes for writing regex in php so that it does not process any of the special chars inside – kabeersvohra Jul 27 '15 at 17:24
  • Fly like a butterfly! Sting like a bee! – krisanalfa Nov 21 '15 at 16:45
  • 1
    This regex does not cover umlaut domains and other similar special chars which are [perfectly valid](https://www.nic.ch/reg/cm/wcm-page/index.html?lid=en&plain=&res=/reg/guest/faqs/idn.jsp)... – nerdoc Mar 08 '16 at 19:34
  • `127.0.0.1.1` is considered a valid domain by this function. – Bob Ortiz Jun 29 '16 at 21:17
  • IPv4 URLs are considered valid domains `http://127.0.0.1 => 127.0.0.1` but IPv6 URLs like `http://[2001:db8::7]` are not considered valid domains. There is some inconsistency – damko Oct 27 '16 at 00:12
  • testingj11.com this domain say invalid when used this filter ! Why is that ? testingj11.com testingj21.com testingj31.com testingj41.com testingj51.com testingj61.com testingj71.com testingj81.com testingj91.com testingj101.com this also invalid can anyone tell me what's going on this – Amit Chauhan Dec 24 '16 at 07:08
  • 1
    125 upvotes... This function will reject valid UTF-8 domains, it will accept phone-numbers as domain, it will reject IPV6 ips but accept IPV4 ips and it's using 3 performance heavy regex searches to do so. Use with caution. – John Nov 17 '18 at 02:59
  • 1
    UTF-8 domains and domains with umlatus, etc are NOT valid domain names. They are IDN domains which should always be used with punycode! Unicode domains should be converted to punycode before any processing because they are not valid domains. – jg6 May 19 '19 at 12:36
  • A *domain*name may actually contain underscores, a *host*name cannot contain an underscore. See https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it – Léon Melis Aug 19 '19 at 14:21
  • Domain may contain russian letters, so: it will be better – mindsupport Dec 26 '19 at 14:38
67

With this you will not only be checking if the domain has a valid format, but also if it is active / has an IP address assigned to it.

$domain = "stackoverflow.com";

if(filter_var(gethostbyname($domain), FILTER_VALIDATE_IP))
{
    return TRUE;
}

Note that this method requires the DNS entries to be active so if you require a domain string to be validated without being in the DNS use the regular expression method given by velcrow above.

Also this function is not intended to validate a URL string use FILTER_VALIDATE_URL for that. We do not use FILTER_VALIDATE_URL for a domain because a domain string is not a valid URL.

RoboTamer
  • 3,474
  • 2
  • 39
  • 43
  • Only that I would use the filter: `FILTER_VALIDATE_URL` instead of `FILTER_VALIDATE_IP` – Nir Alfasi Oct 09 '12 at 17:07
  • FILTER_VALIDATE_URL will only find ASCII URLs to be valid; internationalized domain names (containing non-ASCII characters) will fail. (http://php.net/manual/en/filter.filters.validate.php) – Edson Medina Jan 08 '13 at 15:12
  • 9
    gethostbyname does a blocking dns lookup, so don't run this to loop over a large dataset, you will have horrible runtime. – velcrow Feb 20 '13 at 21:53
  • fails for url with query string i.e. `https://www.google.co.uk/search?q=google&gws_rd=ssl` – Templar Jul 01 '14 at 13:18
  • @Templar it is a function to validate **host name** not a URL – php_nub_qq Feb 04 '15 at 10:06
  • 6
    This will fail for a valid domain or host name that doesn't actually exist in DNS yet. -1. – Shadur Apr 19 '16 at 07:59
  • this is pretty narrow minded. imagine you're trying to validate a domain that is not registered yet. like doing an actual domain name register service or something ... – user151496 May 24 '16 at 15:01
  • This does not validate a domain in PHP, it merely passes on the validation to the DNS level which then gets reported back through PHP. – Phil Sep 29 '16 at 08:04
  • This is the way - "How to validate has domain set correct DNS record" (or something like that).. – Aurimas Jan 19 '22 at 12:36
  • This is a great solution if you need to make sure a domain is active, it does (in a sense) check that it's a valid domain name. But the OP asked how to do it with a string – Chad Reitsma Jun 11 '22 at 16:41
  • this enables also IP address – luky Jul 31 '23 at 08:34
38

PHP 7

// Validate a domain name
var_dump(filter_var('mandrill._domainkey.mailchimp.com', FILTER_VALIDATE_DOMAIN));
# string(33) "mandrill._domainkey.mailchimp.com"

// Validate an hostname (here, the underscore is invalid)
var_dump(filter_var('mandrill._domainkey.mailchimp.com', FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME));
# bool(false)

It is not documented here: http://www.php.net/filter.filters.validate and a bug request for this is located here: https://bugs.php.net/bug.php?id=72013

Kerem
  • 11,377
  • 5
  • 59
  • 58
Rob
  • 6,758
  • 4
  • 46
  • 51
  • 1
    Hmm, this works, but gives some false positives: ``` >>> filter_var('ripe', FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME) => "ripe" ``` – М.Б. Jun 27 '19 at 15:58
  • @М.Б. This is a valid domain name according to the specifications [RFC 1035](https://tools.ietf.org/html/rfc1035) **"They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less. "** . You can add more restriction to be must include a dot `.` to make what you want – Accountant م Sep 03 '19 at 09:35
  • @Accountant Alright, thanks! I thought `.` is already included in validator itself. – М.Б. Sep 03 '19 at 12:41
  • 1
    @Accountantم, I think there may be an RFC that expands on that definition, because I used to have the domain "2tp.com", and I've seen numerous other domain names that start with a digit. – scott8035 Dec 12 '19 at 18:33
  • Not matching the usage of online domain names... – Loenix Feb 02 '23 at 16:59
  • documentation has been updated: https://www.php.net/filter.filters.validate – Sybille Peters May 02 '23 at 05:02
  • this enables also ip address – luky Jul 31 '23 at 08:37
13

use checkdnsrr http://php.net/manual/en/function.checkdnsrr.php

$domain = "stackoverflow.com";

checkdnsrr($domain , "A");

//returns true if has a dns A record, false otherwise
jacktrade
  • 3,125
  • 2
  • 36
  • 50
  • 6
    not really useful if you want to check a domain that has a valid structure, but is not registered (yet). – Ludo - Off the record Apr 07 '15 at 14:13
  • not necessary, php will look the dns server in the network, if you register the domain at local level in your network you will get a true response from checkdnsrr – jacktrade Jun 10 '19 at 08:20
  • 2
    so you're basically suggesting to register every domain you want to check locally first? that doesn't make sense, also you can register non-valid domains locally, which beats the purpose of what the @richard-knop is trying to accomplish here. – Ludo - Off the record Jun 17 '19 at 21:40
  • useful if you want to use in conjuction with existing domains – luky Jul 31 '23 at 08:40
10

Firstly, you should clarify whether you mean:

  1. individual domain name labels
  2. entire domain names (i.e. multiple dot-separate labels)
  3. host names

The reason the distinction is necessary is that a label can technically include any characters, including the NUL, @ and '.' characters. DNS is 8-bit capable and it's perfectly possible to have a zone file containing an entry reading "an\0odd\.l@bel". It's not recommended of course, not least because people would have difficulty telling a dot inside a label from those separating labels, but it is legal.

However, URLs require a host name in them, and those are governed by RFCs 952 and 1123. Valid host names are a subset of domain names. Specifically only letters, digits and hyphen are allowed. Furthermore the first and last characters cannot be a hyphen. RFC 952 didn't permit a number for the first character, but RFC 1123 subsequently relaxed that.

Hence:

  • a - valid
  • 0 - valid
  • a- - invalid
  • a-b - valid
  • xn--dasdkhfsd - valid (punycode encoding of an IDN)

Off the top of my head I don't think it's possible to invalidate the a- example with a single simple regexp. The best I can come up with to check a single host label is:

if (preg_match('/^[a-z\d][a-z\d-]{0,62}$/i', $label) &&
   !preg_match('/-$/', $label))
{
    # label is legal within a hostname
}

To further complicate matters, some domain name entries (typically SRV records) use labels prefixed with an underscore, e.g. _sip._udp.example.com. These are not host names, but are legal domain names.

Alnitak
  • 334,560
  • 70
  • 407
  • 495
7

Here is another way without regex.

$myUrl = "http://www.domain.com/link.php";
$myParsedURL = parse_url($myUrl);
$myDomainName= $myParsedURL['host'];
$ipAddress = gethostbyname($myDomainName);
if($ipAddress == $myDomainName)
{
   echo "There is no url";
}
else
{
   echo "url found";
}
Erkan BALABAN
  • 1,347
  • 1
  • 13
  • 20
6

I think once you have isolated the domain name, say, using Erklan's idea:

$myUrl = "http://www.domain.com/link.php";
$myParsedURL = parse_url($myUrl);
$myDomainName= $myParsedURL['host'];

you could use :

if( false === filter_var( $myDomainName, FILTER_VALIDATE_URL ) ) {
// failed test

}

PHP5s Filter functions are for just such a purpose I would have thought.

It does not strictly answer your question as it does not use Regex, I realise.

Cups
  • 6,901
  • 3
  • 26
  • 30
  • I'm not sure this will really work. The RRF for URIs (which is what the filter does) includes things like file:///some/path or the like. URL/URIs don't necessarily include valid hostnames. – Josh Koenig May 03 '11 at 00:43
3

Regular expression is the most effective way of checking for a domain validation. If you're dead set on not using a Regular Expression (which IMO is stupid), then you could split each part of a domain:

  • www. / sub-domain
  • domain name
  • .extension

You would then have to check each character in some sort of a loop to see that it matches a valid domain.

Like I said, it's much more effective to use a regular expression.

James Brooks
  • 1,281
  • 5
  • 17
  • 28
  • For sure regex is not the most effective way of checking for domain validation. It's way better to iterate char by char or something like it. – nacholibre Oct 06 '16 at 07:23
2

Your regular expression is fine, but you're not using preg_match right. It returns an int (0 or 1), not a boolean. Just write if(!preg_match($regex, $string)) { ... }

Arthur Reutenauer
  • 2,622
  • 1
  • 17
  • 15
1

If you want to check whether a particular domain name or ip address exists or not, you can also use checkdnsrr
Here is the doc http://php.net/manual/en/function.checkdnsrr.php

Agustinus Verdy
  • 7,267
  • 6
  • 26
  • 28
1

If you don't want to use regular expressions, you can try this:

$str = 'domain-name';

if (ctype_alnum(str_replace('-', '', $str)) && $str[0] != '-' && $str[strlen($str) - 1] != '-') {
    echo "Valid domain\n";
} else {
    echo "Invalid domain\n";
}

but as said regexp are the best tool for this.

Matteo Riva
  • 24,728
  • 12
  • 72
  • 104
1

A valid domain is for me something I'm able to register or at least something that looks like I could register it. This is the reason why I like to separate this from "localhost"-names.

And finally I was interested in the main question if avoiding Regex would be faster and this is my result:

<?php
function filter_hostname($name, $domain_only=false) {
    // entire hostname has a maximum of 253 ASCII characters
    if (!($len = strlen($name)) || $len > 253
    // .example.org and localhost- are not allowed
    || $name[0] == '.' || $name[0] == '-' || $name[ $len - 1 ] == '.' || $name[ $len - 1 ] == '-'
    // a.de is the shortest possible domain name and needs one dot
    || ($domain_only && ($len < 4 || strpos($name, '.') === false))
    // several combinations are not allowed
    || strpos($name, '..') !== false
    || strpos($name, '.-') !== false
    || strpos($name, '-.') !== false
    // only letters, numbers, dot and hypen are allowed
/*
    // a little bit slower
    || !ctype_alnum(str_replace(array('-', '.'), '', $name))
*/
    || preg_match('/[^a-z\d.-]/i', $name)
    ) {
        return false;
    }
    // each label may contain up to 63 characters
    $offset = 0;
    while (($pos = strpos($name, '.', $offset)) !== false) {
        if ($pos - $offset > 63) {
            return false;
        }
        $offset = $pos + 1;
    }
    return $name;
}
?>

Benchmark results compared with velcrow 's function and 10000 iterations (complete results contains many code variants. It was interesting to find the fastest.):

filter_hostname($domain);// $domains: 0.43556308746338 $real_world: 0.33749794960022
is_valid_domain_name($domain);// $domains: 0.81832790374756 $real_world: 0.32248711585999

$real_world did not contain extreme long domain names to produce better results. And now I can answer your question: With the usage of ctype_alnum() it would be possible to realize it without regex, but as preg_match() was faster I would prefer that.

If you don't like the fact that "local.host" is a valid domain name use this function instead that valids against a public tld list. Maybe someone finds the time to combine both.

Community
  • 1
  • 1
mgutt
  • 5,867
  • 2
  • 50
  • 77
1

The correct answer is that you don't ... you let a unit tested tool do the work for you:

// return '' if host invalid --
private function setHostname($host = '')
{
    $ret = (!empty($host)) ? $host : '';
    if(filter_var('http://'.$ret.'/', FILTER_VALIDATE_URL) === false) {
        $ret = '';
    }
    return $ret;
}

further reading :https://www.w3schools.com/php/filter_validate_url.asp

Mike Q
  • 6,716
  • 5
  • 55
  • 62
1

If you can run shell commands, following is the best way to determine if a domain is registered.

This function returns false, if domain name isn't registered else returns domain name.

function get_domain_name($domain) { 
    //Step 1 - Return false if any shell sensitive chars or space/tab were found
    if(escapeshellcmd($domain)!=$domain || count(explode(".", $domain))<2 || preg_match("/[\s\t]/", $domain)) {
            return false;
    }

    //Step 2 - Get the root domain in-case of subdomain
    $domain = (count(explode(".", $domain))>2 ? strtolower(explode(".", $domain)[count(explode(".", $domain))-2].".".explode(".", $domain)[count(explode(".", $domain))-1]) : strtolower($domain));

    //Step 3 - Run shell command 'dig' to get SOA servers for the domain extension
    $ns = shell_exec(escapeshellcmd("dig +short SOA ".escapeshellarg(explode(".", $domain)[count(explode(".", $domain))-1]))); 

    //Step 4 - Return false if invalid extension (returns NULL), or take the first server address out of output
    if($ns===NULL) {
            return false;
    }
    $ns = (((preg_split('/\s+/', $ns)[0])[strlen(preg_split('/\s+/', $ns)[0])-1]==".") ? substr(preg_split('/\s+/', $ns)[0], 0, strlen(preg_split('/\s+/', $ns)[0])-1) : preg_split('/\s+/', $ns)[0]);

    //Step 5 - Run another dig using the obtained address for our domain, and return false if returned NULL else return the domain name. This assumes an authoritative NS is assigned when a domain is registered, can be improved to filter more accurately.
    $ans = shell_exec(escapeshellcmd("dig +noall +authority ".escapeshellarg("@".$ns)." ".escapeshellarg($domain))); 
    return (($ans===NULL) ? false : ((strpos($ans, $ns)>-1) ? false : $domain));
}

Pros

  1. Works on any domain, while php dns functions may fail on some domains. (my .pro domain failed on php dns)
  2. Works on fresh domains without any dns (like A) records
  3. Unicode friendly

Cons

  1. Usage of shell execution, probably
Ajay Singh
  • 692
  • 8
  • 19
0
<?php

if(is_valid_domain('https://www.google.com')==1){
  echo 'Valid';
}else{
   echo 'InValid';
}

 function is_valid_domain($url){

    $validation = FALSE;
    /*Parse URL*/    
    $urlparts = parse_url(filter_var($url, FILTER_SANITIZE_URL));

    /*Check host exist else path assign to host*/    
    if(!isset($urlparts['host'])){
        $urlparts['host'] = $urlparts['path'];
    }

    if($urlparts['host']!=''){
        /*Add scheme if not found*/        if (!isset($urlparts['scheme'])){
        $urlparts['scheme'] = 'http';
        }

        /*Validation*/        
    if(checkdnsrr($urlparts['host'], 'A') && in_array($urlparts['scheme'],array('http','https')) && ip2long($urlparts['host']) === FALSE){ 
        $urlparts['host'] = preg_replace('/^www\./', '', $urlparts['host']);
        $url = $urlparts['scheme'].'://'.$urlparts['host']. "/";            

            if (filter_var($url, FILTER_VALIDATE_URL) !== false && @get_headers($url)) {
                $validation = TRUE;
            }
        }
    }

    return $validation;

}
?>
0

After reading all the issues with the added functions I decided I need something more accurate. Here's what I came up with that works for me.

If you need to specifically validate hostnames (they must start and end with an alphanumberic character and contain only alphanumerics and hyphens) this function should be enough.

function is_valid_domain($domain) {
    // Check for starting and ending hyphen(s)
    if(preg_match('/-./', $domain) || substr($domain, 1) == '-') {
        return false;
    }

    // Detect and convert international UTF-8 domain names to IDNA ASCII form
    if(mb_detect_encoding($domain) != "ASCII") {
        $idn_dom = idn_to_ascii($domain);
    } else {
        $idn_dom = $domain;
    }

    // Validate
    if(filter_var($idn_dom, FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME) != false) {
        return true;
    }
    return false;
}

Note that this function will work on most (haven't tested all languages) LTR languages. It will not work on RTL languages.

is_valid_domain('a');                                                                       Y
is_valid_domain('a.b');                                                                     Y
is_valid_domain('localhost');                                                               Y
is_valid_domain('google.com');                                                              Y
is_valid_domain('news.google.co.uk');                                                       Y
is_valid_domain('xn--fsqu00a.xn--0zwm56d');                                                 Y
is_valid_domain('area51.com');                                                              Y
is_valid_domain('japanese.コム');                                                           Y
is_valid_domain('домейн.бг');                                                               Y
is_valid_domain('goo gle.com');                                                             N
is_valid_domain('google..com');                                                             N
is_valid_domain('google-.com');                                                             N
is_valid_domain('.google.com');                                                             N
is_valid_domain('<script');                                                                 N
is_valid_domain('alert(');                                                                  N
is_valid_domain('.');                                                                       N
is_valid_domain('..');                                                                      N
is_valid_domain(' ');                                                                       N
is_valid_domain('-');                                                                       N
is_valid_domain('');                                                                        N
is_valid_domain('-günter-.de');                                                             N
is_valid_domain('-günter.de');                                                              N
is_valid_domain('günter-.de');                                                              N
is_valid_domain('sadyasgduysgduysdgyuasdgusydgsyudgsuydgusydgsyudgsuydusdsdsdsaad.com');    N
is_valid_domain('2001:db8::7');                                                             N
is_valid_domain('876-555-4321');                                                            N
is_valid_domain('1-876-555-4321');                                                          N
GTodorov
  • 1,993
  • 21
  • 24
-1

I know that this is an old question, but it was the first answer on a Google search, so it seems relevant. I recently had this same problem. The solution in my case was to just use the Public Suffix List:

https://publicsuffix.org/learn/

The suggested language specific libraries listed should all allow for easy validation of not just domain format, but also top level domain validity.

jeffers102
  • 11
  • 3
  • Quote from the site: Some use the PSL to determine what is a valid domain name and what isn't. This is dangerous. gTLDs and ccTLDs are constantly updating, coming and going - and certainly not static. – Christian Rauchenwald Oct 04 '22 at 13:38
-3

Check the php function checkdnsrr

function validate_email($email){

   $exp = "^[a-z\'0-9]+([._-][a-z\'0-9]+)*@([a-z0-9]+([._-][a-z0-9]+))+$";

   if(eregi($exp,$email)){

      if(checkdnsrr(array_pop(explode("@",$email)),"MX")){
        return true;
      }else{
        return false;
      }

   }else{

      return false;

   }   
}
Templar
  • 1,843
  • 7
  • 29
  • 42
codeCraft
  • 62
  • 5
-3

This is validation of domain name in javascript:

<script>
function frmValidate() {
 var val=document.frmDomin.name.value;
 if (/^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$/.test(val)){
      alert("Valid Domain Name");
      return true;
 } else {
      alert("Enter Valid Domain Name");
      val.name.focus();
      return false;
 }
}
</script>
Joseph
  • 1,076
  • 10
  • 22
KS Rajput
  • 203
  • 2
  • 8
-6

This is simple. Some php egnine has a problem with split(). This code below will work.

<?php
$email = "vladimiroliva@ymail.com"; 
$domain = strtok($email, "@");
$domain = strtok("@");
if (@getmxrr($domain,$mxrecords)) 
   echo "This ". $domain." EXIST!"; 
else 
   echo "This ". $domain." does not exist!"; 
?>

RobertPitt
  • 56,863
  • 21
  • 114
  • 161
bong
  • 1