34

I need to write a function to parse variables which contain domain names. It's best I explain this with an example, the variable could contain any of these things:

here.example.com
example.com
example.org
here.example.org

But when passed through my function all of these must return either example.com or example.co.uk, the root domain name basically. I'm sure I've done this before but I've been searching Google for about 20 minutes and can't find anything. Any help would be appreciated.

EDIT: Ignore the .co.uk, presume that all domains going through this function have a 3 letter TLD.

zuk1
  • 18,009
  • 21
  • 59
  • 63
  • Where do you draw the line? E.g., what about free-domain services like .de.vu? – balpha Jul 29 '09 at 15:50
  • Here's a test case: would you want example.uk.com be to be identified as "uk.com" or "example.uk.com"? Technically the domain name is uk.com and example.uk.com is a subdomain, but some people have a different preference depending on what they think of the Centralnic domains. – Richy B. Jul 29 '09 at 15:51
  • It will only ever be com,co.uk,ca,com.au and possibly info domain names. To be honest it's not a huge problem if I can only get it to work just .com's :) – zuk1 Jul 29 '09 at 15:52
  • @balpha: and cases like example.de, example.fr – palindrom Jul 29 '09 at 15:53
  • @Richy C., you make a good point, but the OP did use "example.co.uk" as a return example. – TSomKes Jul 29 '09 at 15:53
  • possible duplicate of [Url splitting in php](http://stackoverflow.com/questions/1102447/url-splitting-in-php) – outis Apr 01 '12 at 21:33

27 Answers27

50

Stackoverflow Question Archive:


print get_domain("http://somedomain.co.uk"); // outputs 'somedomain.co.uk'

function get_domain($url)
{
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }
  return false;
}
Community
  • 1
  • 1
Sampson
  • 265,109
  • 74
  • 539
  • 565
  • Should work. I use it as-is all over the place. How are you using it? – Sampson Jul 29 '09 at 16:16
  • 1
    Sorry it didn't work as is with the 'example.domain.com' format but I took you're regex and made it works, thanks a bunch! – zuk1 Jul 30 '09 at 08:29
  • 4
    This doesn't work on domains that are 3 chars long when the tld is 2 chars long. www.exg.ie returns www.exg.ie as the domain. Any ideas? – Ryaner Aug 06 '10 at 15:43
  • The question is 'how to get domain from a subdomain', not 'how to get domain from a url'. I believe we are all aware of the `parse_url()` function and we can all use it. The issue with your solution is that if you have 'sub.somedomain.com.hk', `parse_url()` will return the whole as a path, which clearly is not correct inside the given context. – Nikola Petkanski Aug 01 '12 at 08:25
  • Look at [this answer](http://stackoverflow.com/questions/1201194/php-getting-domain-name-from-subdomain/11773121#11773121) instead of the wrong answers, and please upvote it so it gets the attention it deserves. – tripleee Aug 15 '13 at 06:08
  • This is not working with http://l.facebook.com/l.php?u=http%3A%2F%2Fpracticeti.me – 5x7Code Jul 22 '14 at 05:00
  • @Sampson I just need only domain name without .com, .co.de etc – Ravi Dec 30 '21 at 12:30
23

If you want a fast simple solution, without external calls and checking against predefined arrays. Works for new domains like "www.domain.gallery" also, unlike the most popular answer.

function get_domain($host){
  $myhost = strtolower(trim($host));
  $count = substr_count($myhost, '.');
  if($count === 2){
    if(strlen(explode('.', $myhost)[1]) > 3) $myhost = explode('.', $myhost, 2)[1];
  } else if($count > 2){
    $myhost = get_domain(explode('.', $myhost, 2)[1]);
  }
  return $myhost;
}
  • domain.com -> domain.com
  • sub.domain.com -> domain.com
  • www.domain.com -> domain.com
  • www.sub.sub.domain.com -> domain.com
  • domain.co.uk -> domain.co.uk
  • sub.domain.co.uk -> domain.co.uk
  • www.domain.co.uk -> domain.co.uk
  • www.sub.sub.domain.co.uk -> domain.co.uk
  • domain.photography -> domain.photography
  • www.domain.photography -> domain.photography
  • www.sub.domain.photography -> domain.photography
suncat100
  • 2,118
  • 1
  • 17
  • 22
  • 1
    I like this one the best actually.. just needed to add a path for "localhost" or "sub.localhost" to return just "localhost" by adding a if ($count == 1) for the decimal. – Dss May 01 '18 at 17:35
  • Sorry for downvote. I teste this code inside a Class, and forget call `self:get_domain` in last else if. Now stackoverflow t does not allow me to remove the downvote and make an upvote. Sorry – abkrim Nov 22 '21 at 18:58
7

I ended up using the database Mozilla has.

Here's my code:

fetch_mozilla_tlds.php contains caching algorhythm. This line is important:

$mozillaTlds = file('http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1');

The main file used inside the application is this:

function isTopLevelDomain($domain)
{
    $domainParts = explode('.', $domain);
    if (count($domainParts) == 1) {
        return false;
    }

    $previousDomainParts = $domainParts;
    array_shift($previousDomainParts);

    $tld = implode('.', $previousDomainParts);

    return isDomainExtension($tld);
}

function isDomainExtension($domain)
{
    $tlds = getTLDs();

    /**
     * direct hit
     */
    if (in_array($domain, $tlds)) {
        return true;
    }

    if (in_array('!'. $domain, $tlds)) {
        return false;
    }

    $domainParts = explode('.', $domain);

    if (count($domainParts) == 1) {
        return false;
    }

    $previousDomainParts = $domainParts;

    array_shift($previousDomainParts);
    array_unshift($previousDomainParts, '*');

    $wildcardDomain = implode('.', $previousDomainParts);

    return in_array($wildcardDomain, $tlds);
}

function getTLDs()
{
    static $mozillaTlds = array();

    if (empty($mozillaTlds)) {
        require 'fetch_mozilla_tlds.php';
        /* @var $mozillaTlds array */
    }

    return $mozillaTlds;
}

UPDATE:
The database has evolved and is now available at its own website - http://publicsuffix.org/

Nikola Petkanski
  • 4,724
  • 1
  • 33
  • 41
  • 1
    +1 You really cannot solve this adequately without specific knowledge about the administration policy of each individual top-level domain. The public suffix database is the canonical source for this information. – tripleee Aug 15 '13 at 06:05
7

I would do something like the following:

// hierarchical array of top level domains
$tlds = array(
    'com' => true,
    'uk' => array(
        'co' => true,
        // …
    ),
    // …
);
$domain = 'here.example.co.uk';
// split domain
$parts = explode('.', $domain);
$tmp = $tlds;
// travers the tree in reverse order, from right to left
foreach (array_reverse($parts) as $key => $part) {
    if (isset($tmp[$part])) {
        $tmp = $tmp[$part];
    } else {
        break;
    }
}
// build the result
var_dump(implode('.', array_slice($parts, - $key - 1)));
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • 1
    To me this seems the most comprehensive way of doing this test - you can make it as elaborate and checking as many TLDs/SLDs as you have time to add. – Richy B. Jul 29 '09 at 15:57
  • True but I'd like it to be as lightweight as possible because it's going to be doing potnetially thousands of these in one loop, so if there's a more efficient option I'll go with that. – zuk1 Jul 29 '09 at 16:03
  • 1
    Accessing an array does only cost O(1). And with a maximum depth of two for any top level domain I know (https://wiki.mozilla.org/TLD_List), you will always get your result within at most two steps. I don’t know any other way that is more efficient. – Gumbo Jul 29 '09 at 16:16
  • The key is how many records are there going to be in `$tlds`? – user198729 Feb 06 '10 at 11:40
  • 2
    @user198729 - http://publicsuffix.org/list/ has superseded the list @Gumbo linked to. By my count (`cat effective_tld_names.dat | grep -v "^//" | grep -v "^$" | wc -l`) it's currently 3692 entries, so not too bad. – John Carter Jul 09 '10 at 13:36
5

Almost certainly, what you're looking for is this:

https://github.com/Synchro/regdom-php

It's a PHP library that utilizes the (as nearly as is practical) full list of various TLD's that's collected at publicsuffix.org/list/ , and wraps it up in a spiffy little function.

Once the library is included, it's as easy as:

$registeredDomain = getRegisteredDomain( $domain );

Gogol
  • 3,033
  • 4
  • 28
  • 57
Xaroth
  • 51
  • 1
  • 1
  • 2
    There is an updated version at https://github.com/Synchro/regdom-php, but that too hasn't been updated in years. So far, https://github.com/jeremykendall/php-domain-parser and https://github.com/layershifter/TLDExtract seem to be the most up to date libraries I have found to do this task. – orrd May 31 '16 at 05:35
  • I have changed the broken link with the one @orrd mentioned. The previous link is redirecting to a parked domain page. If we don't change the link, it will be free advertising for whoever owns the domain.. – Gogol Apr 22 '19 at 09:57
  • Domains come and go, @gogol. Gogol from BHW? – rafark Dec 15 '19 at 21:10
4
    $full_domain = $_SERVER['SERVER_NAME'];
$just_domain = preg_replace("/^(.*\.)?([^.]*\..*)$/", "$2", $_SERVER['HTTP_HOST']);
TigerTiger
  • 10,590
  • 15
  • 57
  • 72
  • I'm trying to get the domain from a variable which could contain any domain on the internet, not the one the script is residing on. – zuk1 Jul 29 '09 at 15:53
  • Sorry I just realised it was bleedingly obvious I could change your code to do what I need. It certainly works with example.domain.com but example.domain.co.uk. I'll hold off unless I get a better answer but so far this does accomplish what I need, I'd just prefer to have more TLD compatibility. – zuk1 Jul 29 '09 at 16:10
3

There are two ways to extract subdomain from a host:

  1. The first method that is more accurate is to use a database of tlds (like public_suffix_list.dat) and match domain with it. This is a little heavy in some cases. There are some PHP classes for using it like php-domain-parser and TLDExtract.

  2. The second way is not as accurate as the first one, but is very fast and it can give the correct answer in many case, I wrote this function for it:

     function get_domaininfo($url) {
         // regex can be replaced with parse_url
         preg_match("/^(https|http|ftp):\/\/(.*?)\//", "$url/" , $matches);
         $parts = explode(".", $matches[2]);
         $tld = array_pop($parts);
         $host = array_pop($parts);
         if ( strlen($tld) == 2 && strlen($host) <= 3 ) {
             $tld = "$host.$tld";
             $host = array_pop($parts);
         }
    
         return array(
             'protocol' => $matches[1],
             'subdomain' => implode(".", $parts),
             'domain' => "$host.$tld",
             'host'=>$host,'tld'=>$tld
         );
     }
    

    Example:

     print_r(get_domaininfo('https://mysubdomain.domain.co.uk/index.php'));
    

    Returns:

     Array
     (
         [protocol] => https
         [subdomain] => mysubdomain
         [domain] => domain.co.uk
         [host] => domain
         [tld] => co.uk
     )
    
Ehsan Chavoshi
  • 681
  • 6
  • 10
2

This is a short way of accomplishing that:

$host = $_SERVER['HTTP_HOST'];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";
Francisco Luz
  • 2,775
  • 2
  • 25
  • 35
2

Check this my simple solution!


function getHost($a){
    $tld = preg_replace('/.*\.([a-zA-Z]+)$/','$1',$a);
    return trim(preg_replace('/.*(([\.\/][a-zA-Z]{2,}){'.((substr_count($a, '.') <= 2 && mb_strlen( $tld) != 2) ? '2,3' : '3,4').'})/im','$1',$a),'./');
}


echo getHost('https://webmail.google.com.br')."<br>";
echo getHost('https://google.com.br')."<br>";
echo getHost('https://webmail.google.net.br')."<br>";
echo getHost('https://webmail.google.net')."<br>";
echo getHost('https://google.net')."<br>";
echo getHost('webmail.google.com.br')."<br>";

#output

google.com.br
google.com.br
google.net.br
google.net
google.net
google.com.br
Patrick Otto
  • 171
  • 2
  • 6
1

This script generates a Perl file containing a single function, get_domain from the ETLD file. So say you have hostnames like img1, img2, img3, ... in .photobucket.com. For each of those get_domain $host would return photobucket.com. Note that this isn't the fastest function on earth, so in my main log parser that's using this, I keep a hash of host to domain mappings and only run this for hosts that aren't in the hash yet.

#!/bin/bash

cat << 'EOT' > suffixes.pl
#!/bin/perl

sub get_domain {
  $_ = shift;
EOT

wget -O - http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 \
  | iconv -c -f UTF-8 -t ASCII//TRANSLIT \
  | egrep -v '/|^$' \
  | sed -e 's/^\!//' -e "s/\"/'/g" \
  | awk '{ print length($0),$0 | "sort -rn"}' | cut -d" " -f2- \
  | while read SUFF; do
      STAR=`echo $SUFF | cut -b1`
      if [ "$STAR" = '*' ]; then
        SUFF=`echo $SUFF | cut -b3-`
        echo "  return \"\$1\.\$2\.$SUFF\" if /([a-zA-Z0-9\-]+)\.([a-zA-Z0-9\-]+)\.$SUFF\$/;"
      else
        echo "  return \"\$1\.$SUFF\" if /([a-zA-Z0-9\-]+)\.$SUFF\$/;"
      fi
    done >> suffixes.pl

cat << 'EOT' >> suffixes.pl
}

1;
EOT
Chad
  • 11
  • 2
1

As already said Public Suffix List is only one way to parse domain correctly. I recomend TLDExtract package, here is sample code:

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('here.example.com');
$result->getSubdomain(); // will return (string) 'here'
$result->getHostname(); // will return (string) 'example'
$result->getSuffix(); // will return (string) 'com'
Oleksandr Fediashov
  • 4,315
  • 1
  • 24
  • 42
1

This isn't foolproof and should only really be used if you know the domain isn't going to be anything obscure, but it's easier to read than most of the other options:

$justDomain = $_SERVER['SERVER_NAME'];
switch(substr_count($justDomain, '.')) {
    case 1:
        // 2 parts. Must not be a subdomain. Do nothing.
        break;

    case 2:
        // 3 parts. Either a subdomain or a 2-part suffix
        // If the 2nd part is over 3 chars's, assume it to be the main domain part which means we have a subdomain.
        // This isn't foolproof, but should be ok for most domains.
        // Something like domainname.parliament.nz would cause problems, though. As would www.abc.com
        $parts = explode('.', $justDomain);
        if(strlen($parts[1]) > 3) {
            unset($parts[0]);
            $justDomain = implode('.', $parts);
        }
        break;

    default:
        // 4+ parts. Must be a subdomain.
        $parts = explode('.', $justDomain, 2);
        $justDomain = $parts[1];
        break;
}

// $justDomain should now exclude any subdomain part.
Ric
  • 458
  • 1
  • 7
  • 23
  • 1
    Seems this answer went unnoticed, but I found it to be the best solution. In addition to the minor issue you mention, there are two minor flaws: 3-letter primary domains with subdomains will fail, for example **sub.abc.com** > sub.abc.com. Also, if 4 parts (or more), the function should be re-run with first segment removed, because 4 parts could be **www.domain.co.uk** OR **www.sub.domain.com**. See fix for this in my answer, or check the GIST: https://gist.github.com/mjau-mjau/8a6395730c597f5e77007296f733d721 – suncat100 Mar 17 '18 at 14:58
  • 1
    suncat100, glad my crude approach gave you something to work from. Regex is brilliant but I wanted to provide something more bitesize than those answers. Your own answer does sound like a more reliable approach and additionally looks more elegant than mine. Nicely done! – Ric Mar 19 '18 at 11:44
1
//For short domain like t.co (twitter) the function should be :

function get_domain($url)
{
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{0,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }
  return false;
}
PierreK
  • 41
  • 5
1

Based on http://www.cafewebmaster.com/find-top-level-domain-international-urls-php

function find_tld($url){

$purl  = parse_url($url);
$host  = strtolower($purl['host']);

$valid_tlds = ".ab.ca .bc.ca .mb.ca .nb.ca .nf.ca .nl.ca .ns.ca .nt.ca .nu.ca .on.ca .pe.ca .qc.ca .sk.ca .yk.ca .com.cd .net.cd .org.cd .com.ch .net.ch .org.ch .gov.ch .co.ck .ac.cn .com.cn .edu.cn .gov.cn .net.cn .org.cn .ah.cn .bj.cn .cq.cn .fj.cn .gd.cn .gs.cn .gz.cn .gx.cn .ha.cn .hb.cn .he.cn .hi.cn .hl.cn .hn.cn .jl.cn .js.cn .jx.cn .ln.cn .nm.cn .nx.cn .qh.cn .sc.cn .sd.cn .sh.cn .sn.cn .sx.cn .tj.cn .xj.cn .xz.cn .yn.cn .zj.cn .com.co .edu.co .org.co .gov.co .mil.co .net.co .nom.co .com.cu .edu.cu .org.cu .net.cu .gov.cu .inf.cu .gov.cx .edu.do .gov.do .gob.do .com.do .org.do .sld.do .web.do .net.do .mil.do .art.do .com.dz .org.dz .net.dz .gov.dz .edu.dz .asso.dz .pol.dz .art.dz .com.ec .info.ec .net.ec .fin.ec .med.ec .pro.ec .org.ec .edu.ec .gov.ec .mil.ec .com.ee .org.ee .fie.ee .pri.ee .eun.eg .edu.eg .sci.eg .gov.eg .com.eg .org.eg .net.eg .mil.eg .com.es .nom.es .org.es .gob.es .edu.es .com.et .gov.et .org.et .edu.et .net.et .biz.et .name.et .info.et .co.fk .org.fk .gov.fk .ac.fk .nom.fk .net.fk .tm.fr .asso.fr .nom.fr .prd.fr .presse.fr .com.fr .gouv.fr .com.ge .edu.ge .gov.ge .org.ge .mil.ge .net.ge .pvt.ge .co.gg .net.gg .org.gg .com.gi .ltd.gi .gov.gi .mod.gi .edu.gi .org.gi .com.gn .ac.gn .gov.gn .org.gn .net.gn .com.gr .edu.gr .net.gr .org.gr .gov.gr .com.hk .edu.hk .gov.hk .idv.hk .net.hk .org.hk .com.hn .edu.hn .org.hn .net.hn .mil.hn .gob.hn .iz.hr .from.hr .name.hr .com.hr .com.ht .net.ht .firm.ht .shop.ht .info.ht .pro.ht .adult.ht .org.ht .art.ht .pol.ht .rel.ht .asso.ht .perso.ht .coop.ht .med.ht .edu.ht .gouv.ht .gov.ie .co.in .firm.in .net.in .org.in .gen.in .ind.in .nic.in .ac.in .edu.in .res.in .gov.in .mil.in .ac.ir .co.ir .gov.ir .net.ir .org.ir .sch.ir .gov.it .co.je .net.je .org.je .edu.jm .gov.jm .com.jm .net.jm .com.jo .org.jo .net.jo .edu.jo .gov.jo .mil.jo .co.kr .or.kr .com.kw .edu.kw .gov.kw .net.kw .org.kw .mil.kw .edu.ky .gov.ky .com.ky .org.ky .net.ky .org.kz .edu.kz .net.kz .gov.kz .mil.kz .com.kz .com.li .net.li .org.li .gov.li .gov.lk .sch.lk .net.lk .int.lk .com.lk .org.lk .edu.lk .ngo.lk .soc.lk .web.lk .ltd.lk .assn.lk .grp.lk .hotel.lk .com.lr .edu.lr .gov.lr .org.lr .net.lr .org.ls .co.ls .gov.lt .mil.lt .gov.lu .mil.lu .org.lu .net.lu .com.lv .edu.lv .gov.lv .org.lv .mil.lv .id.lv .net.lv .asn.lv .conf.lv .com.ly .net.ly .gov.ly .plc.ly .edu.ly .sch.ly .med.ly .org.ly .id.ly .co.ma .net.ma .gov.ma .org.ma .tm.mc .asso.mc .org.mg .nom.mg .gov.mg .prd.mg .tm.mg .com.mg .edu.mg .mil.mg .com.mk .org.mk .com.mo .net.mo .org.mo .edu.mo .gov.mo .org.mt .com.mt .gov.mt .edu.mt .net.mt .com.mu .co.mu .aero.mv .biz.mv .com.mv .coop.mv .edu.mv .gov.mv .info.mv .int.mv .mil.mv .museum.mv .name.mv .net.mv .org.mv .pro.mv .com.mx .net.mx .org.mx .edu.mx .gob.mx .com.my .net.my .org.my .gov.my .edu.my .mil.my .name.my .edu.ng .com.ng .gov.ng .org.ng .net.ng .gob.ni .com.ni .edu.ni .org.ni .nom.ni .net.ni .gov.nr .edu.nr .biz.nr .info.nr .com.nr .net.nr .ac.nz .co.nz .cri.nz .gen.nz .geek.nz .govt.nz .iwi.nz .maori.nz .mil.nz .net.nz .org.nz .school.nz .com.pf .org.pf .edu.pf .com.pg .net.pg .com.ph .gov.ph .com.pk .net.pk .edu.pk .org.pk .fam.pk .biz.pk .web.pk .gov.pk .gob.pk .gok.pk .gon.pk .gop.pk .gos.pk .com.pl .biz.pl .net.pl .art.pl .edu.pl .org.pl .ngo.pl .gov.pl .info.pl .mil.pl .waw.pl .warszawa.pl .wroc.pl .wroclaw.pl .krakow.pl .poznan.pl .lodz.pl .gda.pl .gdansk.pl .slupsk.pl .szczecin.pl .lublin.pl .bialystok.pl .olsztyn.pl .torun.pl .biz.pr .com.pr .edu.pr .gov.pr .info.pr .isla.pr .name.pr .net.pr .org.pr .pro.pr .edu.ps .gov.ps .sec.ps .plo.ps .com.ps .org.ps .net.ps .com.pt .edu.pt .gov.pt .int.pt .net.pt .nome.pt .org.pt .publ.pt .net.py .org.py .gov.py .edu.py .com.py .com.ru .net.ru .org.ru .pp.ru .msk.ru .int.ru .ac.ru .gov.rw .net.rw .edu.rw .ac.rw .com.rw .co.rw .int.rw .mil.rw .gouv.rw .com.sa .edu.sa .sch.sa .med.sa .gov.sa .net.sa .org.sa .pub.sa .com.sb .gov.sb .net.sb .edu.sb .com.sc .gov.sc .net.sc .org.sc .edu.sc .com.sd .net.sd .org.sd .edu.sd .med.sd .tv.sd .gov.sd .info.sd .org.se .pp.se .tm.se .parti.se .press.se .ab.se .c.se .d.se .e.se .f.se .g.se .h.se .i.se .k.se .m.se .n.se .o.se .s.se .t.se .u.se .w.se .x.se .y.se .z.se .ac.se .bd.se .com.sg .net.sg .org.sg .gov.sg .edu.sg .per.sg .idn.sg .edu.sv .com.sv .gob.sv .org.sv .red.sv .gov.sy .com.sy .net.sy .ac.th .co.th .in.th .go.th .mi.th .or.th .net.th .ac.tj .biz.tj .com.tj .co.tj .edu.tj .int.tj .name.tj .net.tj .org.tj .web.tj .gov.tj .go.tj .mil.tj .com.tn .intl.tn .gov.tn .org.tn .ind.tn .nat.tn .tourism.tn .info.tn .ens.tn .fin.tn .net.tn .gov.to .gov.tp .com.tr .info.tr .biz.tr .net.tr .org.tr .web.tr .gen.tr .av.tr .dr.tr .bbs.tr .name.tr .tel.tr .gov.tr .bel.tr .pol.tr .mil.tr .k12.tr .edu.tr .co.tt .com.tt .org.tt .net.tt .biz.tt .info.tt .pro.tt .name.tt .edu.tt .gov.tt .gov.tv .edu.tw .gov.tw .mil.tw .com.tw .net.tw .org.tw .idv.tw .game.tw .ebiz.tw .club.tw .co.tz .ac.tz .go.tz .or.tz .ne.tz .com.ua .gov.ua .net.ua .edu.ua .org.ua .cherkassy.ua .ck.ua .chernigov.ua .cn.ua .chernovtsy.ua .cv.ua .crimea.ua .dnepropetrovsk.ua .dp.ua .donetsk.ua .dn.ua .if.ua .kharkov.ua .kh.ua .kherson.ua .ks.ua .khmelnitskiy.ua .km.ua .kiev.ua .kv.ua .kirovograd.ua .kr.ua .lugansk.ua .lg.ua .lutsk.ua .lviv.ua .nikolaev.ua .mk.ua .odessa.ua .od.ua .poltava.ua .pl.ua .rovno.ua .rv.ua .sebastopol.ua .sumy.ua .ternopil.ua .te.ua .uzhgorod.ua .vinnica.ua .vn.ua .zaporizhzhe.ua .zp.ua .zhitomir.ua .zt.ua .co.ug .ac.ug .sc.ug .go.ug .ne.ug .or.ug .ac.uk .co.uk .gov.uk .ltd.uk .me.uk .mil.uk .mod.uk .net.uk .nic.uk .nhs.uk .org.uk .plc.uk .police.uk .bl.uk .icnet.uk .jet.uk .nel.uk .nls.uk .parliament.uk .sch.uk .ak.us .al.us .ar.us .az.us .ca.us .co.us .ct.us .dc.us .de.us .dni.us .fed.us .fl.us .ga.us .hi.us .ia.us .id.us .il.us .in.us .isa.us .kids.us .ks.us .ky.us .la.us .ma.us .md.us .me.us .mi.us .mn.us .mo.us .ms.us .mt.us .nc.us .nd.us .ne.us .nh.us .nj.us .nm.us .nsn.us .nv.us .ny.us .oh.us .ok.us .or.us .pa.us .ri.us .sc.us .sd.us .tn.us .tx.us .ut.us .vt.us .va.us .wa.us .wi.us .wv.us .wy.us .edu.uy .gub.uy .org.uy .com.uy .net.uy .mil.uy .com.ve .net.ve .org.ve .info.ve .co.ve .web.ve .com.vi .org.vi .edu.vi .gov.vi .com.vn .net.vn .org.vn .edu.vn .gov.vn .int.vn .ac.vn .biz.vn .info.vn .name.vn .pro.vn .health.vn .com.ye .net.ye .ac.yu .co.yu .org.yu .edu.yu .ac.za .city.za .co.za .edu.za .gov.za .law.za .mil.za .nom.za .org.za .school.za .alt.za .net.za .ngo.za .tm.za .web.za .co.zm .org.zm .gov.zm .sch.zm .ac.zm .co.zw .org.zw .gov.zw .ac.zw .com.ac .edu.ac .gov.ac .net.ac .mil.ac .org.ac .nom.ad .net.ae .co.ae .gov.ae .ac.ae .sch.ae .org.ae .mil.ae .pro.ae .name.ae .com.ag .org.ag .net.ag .co.ag .nom.ag .off.ai .com.ai .net.ai .org.ai .gov.al .edu.al .org.al .com.al .net.al .com.am .net.am .org.am .com.ar .net.ar .org.ar .e164.arpa .ip6.arpa .uri.arpa .urn.arpa .gv.at .ac.at .co.at .or.at .com.au .net.au .asn.au .org.au .id.au .csiro.au .gov.au .edu.au .com.aw .com.az .net.az .org.az .com.bb .edu.bb .gov.bb .net.bb .org.bb .com.bd .edu.bd .net.bd .gov.bd .org.bd .mil.be .ac.be .gov.bf .com.bm .edu.bm .org.bm .gov.bm .net.bm .com.bn .edu.bn .org.bn .net.bn .com.bo .org.bo .net.bo .gov.bo .gob.bo .edu.bo .tv.bo .mil.bo .int.bo .agr.br .am.br .art.br .edu.br .com.br .coop.br .esp.br .far.br .fm.br .g12.br .gov.br .imb.br .ind.br .inf.br .mil.br .net.br .org.br .psi.br .rec.br .srv.br .tmp.br .tur.br .tv.br .etc.br .adm.br .adv.br .arq.br .ato.br .bio.br .bmd.br .cim.br .cng.br .cnt.br .ecn.br .eng.br .eti.br .fnd.br .fot.br .fst.br .ggf.br .jor.br .lel.br .mat.br .med.br .mus.br .not.br .ntr.br .odo.br .ppg.br .pro.br .psc.br .qsl.br .slg.br .trd.br .vet.br .zlg.br .dpn.br .nom.br .com.bs .net.bs .org.bs .com.bt .edu.bt .gov.bt .net.bt .org.bt .co.bw .org.bw .gov.by .mil.by .ac.cr .co.cr .ed.cr .fi.cr .go.cr .or.cr .sa.cr .com.cy .biz.cy .info.cy .ltd.cy .pro.cy .net.cy .org.cy .name.cy .tm.cy .ac.cy .ekloges.cy .press.cy .parliament.cy .com.dm .net.dm .org.dm .edu.dm .gov.dm .biz.fj .com.fj .info.fj .name.fj .net.fj .org.fj .pro.fj .ac.fj .gov.fj .mil.fj .school.fj .com.gh .edu.gh .gov.gh .org.gh .mil.gh .co.hu .info.hu .org.hu .priv.hu .sport.hu .tm.hu .2000.hu .agrar.hu .bolt.hu .casino.hu .city.hu .erotica.hu .erotika.hu .film.hu .forum.hu .games.hu .hotel.hu .ingatlan.hu .jogasz.hu .konyvelo.hu .lakas.hu .media.hu .news.hu .reklam.hu .sex.hu .shop.hu .suli.hu .szex.hu .tozsde.hu .utazas.hu .video.hu .ac.id .co.id .or.id .go.id .ac.il .co.il .org.il .net.il .k12.il .gov.il .muni.il .idf.il .co.im .net.im .gov.im .org.im .nic.im .ac.im .org.jm .ac.jp .ad.jp .co.jp .ed.jp .go.jp .gr.jp .lg.jp .ne.jp .or.jp .hokkaido.jp .aomori.jp .iwate.jp .miyagi.jp .akita.jp .yamagata.jp .fukushima.jp .ibaraki.jp .tochigi.jp .gunma.jp .saitama.jp .chiba.jp .tokyo.jp .kanagawa.jp .niigata.jp .toyama.jp .ishikawa.jp .fukui.jp .yamanashi.jp .nagano.jp .gifu.jp .shizuoka.jp .aichi.jp .mie.jp .shiga.jp .kyoto.jp .osaka.jp .hyogo.jp .nara.jp .wakayama.jp .tottori.jp .shimane.jp .okayama.jp .hiroshima.jp .yamaguchi.jp .tokushima.jp .kagawa.jp .ehime.jp .kochi.jp .fukuoka.jp .saga.jp .nagasaki.jp .kumamoto.jp .oita.jp .miyazaki.jp .kagoshima.jp .okinawa.jp .sapporo.jp .sendai.jp .yokohama.jp .kawasaki.jp .nagoya.jp .kobe.jp .kitakyushu.jp .per.kh .com.kh .edu.kh .gov.kh .mil.kh .net.kh .org.kh .net.lb .org.lb .gov.lb .edu.lb .com.lb .com.lc .org.lc .edu.lc .gov.lc .army.mil .navy.mil .weather.mobi .music.mobi .ac.mw .co.mw .com.mw .coop.mw .edu.mw .gov.mw .int.mw .museum.mw .net.mw .org.mw .mil.no .stat.no .kommune.no .herad.no .priv.no .vgs.no .fhs.no .museum.no .fylkesbibl.no .folkebibl.no .idrett.no .com.np .org.np .edu.np .net.np .gov.np .mil.np .org.nr .com.om .co.om .edu.om .ac.com .sch.om .gov.om .net.om .org.om .mil.om .museum.om .biz.om .pro.om .med.om .com.pa .ac.pa .sld.pa .gob.pa .edu.pa .org.pa .net.pa .abo.pa .ing.pa .med.pa .nom.pa .com.pe .org.pe .net.pe .edu.pe .mil.pe .gob.pe .nom.pe .law.pro .med.pro .cpa.pro .vatican.va .ac .ad .ae .aero .af .ag .ai .al .am .an .ao .aq .ar .arpa .as .at .au .aw .az .ba .bb .bd .be .bf .bg .bh .bi .biz .bj .bm .bn .bo .br .bs .bt .bv .bw .by .bz .ca .cat .cc .cd .cf .cg .ch .ci .ck .cl .cm .cn .co .com .coop .cr .cu .cv .cx .cy .cz .de .dj .dk .dm .do .dz .ec .edu .ee .eg .er .es .et .eu .fi .fj .fk .fm .fo .fr .ga .gb .gd .ge .gf .gg .gh .gi .gl .gm .gov .gp .gq .gr .gs .gt .gu .gw .gy .hk .hm .hn .hr .ht .hu .id .ie .il .im .in .info .int .io .iq .ir .is .it .je .jm .jo .jobs .jp .ke .kg .kh .ki .km .kn .kr .kw .ky .kz .la .lb .lc .li .lk .lr .ls .lt .lu .lv .ly .ma .mc .md .mg .mh .mil .mk .ml .mm .mn .mo .mobi .mp .mq .mr .ms .mt .mu .museum .mv .mw .na .name .nc .ne .net .nf .ng .ni .nl .no .np .nr .nu .nz .om .org .pa .pe .pf .pg .ph .pk .pl .pm .pn .post .pr .pro .ps .pt .pw .py .qa .re .ro .ru .rw .sa .sb .sc .sd .se .sg .sh .si .sj .sk .sl .sm .sn .so .sr .st .su .sv .sy .sz .tc .td .tf .tg .th .tj .tk .tl .tm .tn .to .tp .tr .travel .tt .tv .tw .tz .ua .ug .uk .um .us .uy .uz .va .vc .ve .vg .vi .vn .vuwf .ye .yt .yu .za .zm .zw .ca .cd .ch .cn .cu .cx .dm .dz .ec .ee .es .fr .ge .gg .gi .gr .hk .hn .hr .ht .hu .ie .in .ir .it .je .jo .jp .kr .ky .li .lk .lt .lu .lv .ly .ma .mc .mg .mk .mo .mt .mu .nl .no .nr .nr .pf .ph .pk .pl .pr .ps .pt .ro .ru .rw .sc .sd .se .sg .tj .to .to .tt .tv .tw .tw .tw .tw .ua .ug .us .vi .vn";



    $tld_regex = '#(.*?)([^.]+)('.str_replace(array('.',' '),array('\\.','|'),$valid_tlds).')$#';

    //remove the extension
    preg_match($tld_regex,$host,$matches);

    if(!empty($matches) && sizeof($matches) > 2){
        $extension = array_pop($matches);
        $tld = array_pop($matches);
        return $tld.$extension;

    }else{ //change to "false" if you prefer
        return $host;
    }



}
cwd
  • 53,018
  • 53
  • 161
  • 198
1

As a variant to Jonathan Sampson

function get_domain($url)   {   
    if ( !preg_match("/^http/", $url) )
        $url = 'http://' . $url;
    if ( $url[strlen($url)-1] != '/' )
        $url .= '/';
    $pieces = parse_url($url);
    $domain = isset($pieces['host']) ? $pieces['host'] : ''; 
    if ( preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs) ) { 
        $res = preg_replace('/^www\./', '', $regs['domain'] );
        return $res;
    }   
    return false;
}
tim4dev
  • 2,846
  • 2
  • 24
  • 30
0

Regex could help you out there. Try something like this:

([^.]+(.com|.co.uk))$

0

I think your problem is that you haven't clearly defined what exactly you want the function to do. From your examples, you certainly don't want it to just blindly return the last two, or last three, components of the name, but just knowing what it shouldn't do isn't enough.

Here's my guess at what you really want: there are certain second-level domain names, like co.uk., that you'd like to be treated as a single TLD (top-level domain) for purposes of this function. In that case I'd suggest enumerating all such cases and putting them as keys into an associative array with dummy values, along with all the normal top-level domains like com., net., info., etc. Then whenever you get a new domain name, extract the last two components and see if the resulting string is in your array as a key. If not, extract just the last component and make sure that's in your array. (If even that isn't, it's not a valid domain name) Either way, whatever key you do find in the array, take that plus one more component off the end of the domain name, and you'll have your base domain.

You could, perhaps, make things a bit simpler by writing a function, instead of using an associative array, to tell whether the last two components should be treated as a single "effective TLD." The function would probably look at the next-to-last component and, if it's shorter than 3 characters, decide that it should be treated as part of the TLD.

David Z
  • 128,184
  • 27
  • 255
  • 279
  • You're right. Presume for this example that all TLD's going through this function have got 3 letters (org,net,com). I basically want to strip the subdomain if there is one and be left with 'domain.com/org/net'. – zuk1 Jul 29 '09 at 15:58
0

To do it well, you'll need a list of the second level domains and top level domains and build an appropriate regular expression list. A good list of second level domains is available at https://wiki.mozilla.org/TLD_List. Another test case apart from the aforementioned CentralNic .uk.com variants is The Vatican: their website is technically at http://va : and that's a difficult one to match on!

Richy B.
  • 1,619
  • 12
  • 20
0

Ah - if you just want to handle three character top level domains - then this code works:

<?php 
// let's test the code works: these should all return
// example.com , example.net or example.org
$domains=Array('here.example.com',
            'example.com',
            'example.org',
        'here.example.org',
        'example.com/ignorethis',
        'example.net/',
        'http://here.example.org/longtest?string=here');
foreach ($domains as $domain) {
 testdomain($domain);
}

function testdomain($url) {
 if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.([A-Za-z]{3})(\/.*)?$/',$url,$matches)) {
    print 'Domain is: '.$matches[3].'.'.$matches[4].'<br>'."\n";
 } else {
    print 'Domain not found in '.$url.'<br>'."\n";
 }
}
?>

$matches[1]/$matches[2] will contain any subdomain and/or protocol, $matches[3] contains the domain name, $matches[4] the top level domain and $matches[5] contains any other URL path information.

To match most common top level domains you could try changing it to:

if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.([A-Za-z]{2,6})(\/.*)?$/',$url,$matches)) {

Or to get it coping with everything:

if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.(co\.uk|me\.uk|org\.uk|com|org|net|int|eu)(\/.*)?$/',$url,$matches)) {

etc etc

Richy B.
  • 1,619
  • 12
  • 20
0

Building on Jonathan's answer:

function main_domain($domain) {
  if (preg_match('/([a-z0-9][a-z0-9\-]{1,63})\.([a-z]{3}|[a-z]{2}\.[a-z]{2})$/i', $domain, $regs)) {
    return $regs;
  }

  return false;
}

His expression might be a bit better, but this interface seems more like what you're describing.

eswald
  • 8,368
  • 4
  • 28
  • 28
0

No need for listing all the countries TLD, they are all 2 letters, besides the special ones listed by IANA

https://gist.github.com/pocesar/5366899

and the tests are here http://codepad.viper-7.com/QfueI0

Comprehensive test suit along with working code. The only caveat is that it won't work with unicode domain names, but that's another level of data extraction.

From the list, I'm testing against:

$urls = array(
'www.example.com' => 'example.com',
'example.com' => 'example.com',
'example.com.br' => 'example.com.br',
'www.example.com.br' => 'example.com.br',
'www.example.gov.br' => 'example.gov.br',
'localhost' => 'localhost',
'www.localhost' => 'localhost',
'subdomain.localhost' => 'localhost',
'www.subdomain.example.com' => 'example.com',
'subdomain.example.com' => 'example.com',
'subdomain.example.com.br' => 'example.com.br',
'www.subdomain.example.com.br' => 'example.com.br',
'www.subdomain.example.biz.br' => 'example.biz.br',
'subdomain.example.biz.br' => 'example.biz.br',
'subdomain.example.net' => 'example.net',
'www.subdomain.example.net' => 'example.net',
'www.subdomain.example.co.kr' => 'example.co.kr',
'subdomain.example.co.kr' => 'example.co.kr',
'example.co.kr' => 'example.co.kr',
'example.jobs' => 'example.jobs',
'www.example.jobs' => 'example.jobs',
'subdomain.example.jobs' => 'example.jobs',
'insane.subdomain.example.jobs' => 'example.jobs',
'insane.subdomain.example.com.br' => 'example.com.br',
'www.doubleinsane.subdomain.example.com.br' => 'example.com.br',
'www.subdomain.example.jobs' => 'example.jobs',
'test' => 'test',
'www.test' => 'test',
'subdomain.test' => 'test',
'www.detran.sp.gov.br' => 'sp.gov.br',
'www.mp.sp.gov.br' => 'sp.gov.br',
'ny.library.museum' => 'library.museum',
'www.ny.library.museum' => 'library.museum',
'ny.ny.library.museum' => 'library.museum',
'www.library.museum' => 'library.museum',
'info.abril.com.br' => 'abril.com.br',
'127.0.0.1' => '127.0.0.1',
'::1' => '::1',
);
pocesar
  • 6,860
  • 6
  • 56
  • 88
  • 1
    You need to pass all the test cases in http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/data/test_psl.txt?raw=1 – tripleee Aug 15 '13 at 08:34
  • @tripleee thanks, I'll make a test suite using the info you provided – pocesar Aug 15 '13 at 22:45
  • If I understand your code correctly, it basically assumes that ccTLDs are three-level and non-ccTLDs are two-level? That's a gross oversimplification of how the real world works. But maybe I miss some finer points of your code? Comments in the code would be useful, as would having a self-contained answer here on StackOverflow. – tripleee Aug 16 '13 at 07:28
  • the code cover most of the use cases when dealing with subdomains and IANA assigned names. oversimplification, IMHO, is to use a regex. The problem is that pasting that much code into SO is a PITA, because of 4 spaces instead of fenced blocks... – pocesar Aug 16 '13 at 10:17
  • 1
    More test cases: `'www.example.gov.fi' => 'gov.fi', '89.67.45.123.in-addr.arpa' => 'in-addr.arpa'` (not sure if the latter makes any sense?) – tripleee Aug 16 '13 at 10:37
  • it's already covered (all IANA special cases) and two letter TLDs (whatever they are) – pocesar Aug 16 '13 at 10:42
0

Here is how you strip the TLD from any URL - I wrote the code to work on my site:
http://internet-portal.me/ - This is a working solution that is used on my site.

$host is the URL that has to be parsed. This code is a simple solution and reliable
compared to everything else I have seen, It works on any URL that I have tried!!!
see this code parsing the page you are looking at right now!
http://internet-portal.me/domain/?dns=https://stackoverflow.com/questions/1201194/php-getting-domain-name-from-subdomain/6320437#6320437

================================================================================

$host = filter_var($_GET['dns']);
$host = $host . '/'; // needed if URL does not have trailing slash

// Strip www, http, https header ;

$host = str_replace( 'http://www.' , '' , $host );
$host = str_replace( 'https://www.' , '' , $host );

$host = str_replace( 'http://' , '' , $host );
$host = str_replace( 'https://' , '' , $host );
$pos = strpos($host, '/'); // find any sub directories
$host = substr( $host, 0, $pos );  //strip directories

$hostArray = explode (".", $host); // count parts of TLD
$size = count ($hostArray) -1; // really only need to know if not a single level TLD
$tld = $hostArray[$size]; // do we need to parse the TLD any further - 
                          // remove subdomains?

if ($size > 1) {
    if ($tld == "aero" or $tld == "asia" or $tld == "biz" or $tld == "cat" or
        $tld == "com" or $tld == "coop" or $tld == "edu" or $tld == "gov" or
        $tld == "info" or $tld == "int" or $tld == "jobs" or $tld == "me" or
        $tld == "mil" or $tld == "mobi" or $tld == "museum" or $tld == "name" or
        $tld == "net" or $tld == "org" or $tld == "pro" or $tld == "tel" or
        $tld == "travel" or $tld == "tv" or $tld == "ws" or $tld == "XXX") {

        $host = $hostArray[$size -1].".".$hostArray[$size]; // parse to 2 level TLD
    } else {
         // parse to 3 level TLD
        $host = $hostArray[$size -2].".".$hostArray[$size -1].".".$hostArray[$size] ;
    }
}
Community
  • 1
  • 1
  • 1
    Why do you single out `me`, `tv`, and `ws` but ignore all other country codes? There are several hundred, many of which have a significantly more complex domain name administration policy than your code assumes. As a simple example, Colombia allows both `example.co` (for foreign domain owners) and `example.com.co` (for domestic Colombian domains). – tripleee Aug 16 '13 at 07:35
0

This is to get domain.tld in any case

public static function get_domain_with_extension($url)
{
    $pieces = parse_url($url);
    $domain = isset($pieces['host']) ? $pieces['host'] : $pieces['path'];
    if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
        return $regs['domain'];
    }
    return false;
}
Ricky Riccs
  • 41
  • 1
  • 5
0

My version also returns the protocol

function host() {
  $protocol = !empty($_SERVER["HTTPS"]) && $_SERVER["HTTPS"] !== "off" ? "https://" : "http://";
  $domain_parts = explode(".", $_SERVER["HTTP_HOST"]);
  $num_parts = count($domain_parts);
  $main_domain = $domain_parts[$num_parts - 2] . '.' . $domain_parts[$num_parts - 1];
                                
  return $protocol . $main_domain;
};
0

Here is what I am using: It works great without needing any arrays for tld's

$split = array_reverse(explode(".", $_SERVER['HTTP_HOST']));
$domain = $split[1].".".$split[0];

if(function_exists('gethostbyname'))
{
    if(gethostbyname($domain) != $_SERVER['SERVER_ADDR'] && isset($split[2]))
    {   
        $domain = $split[2].".".$split[1].".".$split[0];
    }
}
None
  • 1
  • 1
    gethostbyname is very slow and you will obtain massiv timeout problems! – mgutt Mar 08 '12 at 08:49
  • You could do it in one line: $domain = implode(".", array_slice(explode(".", $host), -2)); – rolandow Apr 24 '13 at 09:23
  • I like the idea of doing a live lookup, but in practice, the existence of an IP address for `x.example` does not imply that `domain.x.example` is not a "domain" in the sense intended here. – tripleee Aug 16 '13 at 07:37
0

It is not possible without using a TLD list to compare with as their exist many cases like http://www.db.de/ or http://bbc.co.uk/

But even with that you won't have success in every case because of SLD's like http://big.uk.com/ or http://www.uk.com/

If you need a complete list you can use the public suffix list:

http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1

Feel free to use my function. It won't use regex and it is fast:

http://www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm#3471878

mgutt
  • 5,867
  • 2
  • 50
  • 77
-3

NO NEED FOR REGEX. There exists native parse_url:

echo  parse_url($your_url)['host'];
T.Todua
  • 53,146
  • 19
  • 236
  • 237