2

The domain's need to be returned as the actual domain name, and the extension, separately

http://www.something.com

should return: sld = something, tld= com

something.co.uk

should return: sld = something, tld= co.uk

I am not much familiar with regular expressions, so I really need some help in handling this.

I suppose I can use parse_url(), and check the host, but what then?

Bluemagica
  • 5,000
  • 12
  • 47
  • 73
  • In your second example `co` is the SLD and `uk` is the TLD. – Quentin Feb 07 '12 at 11:57
  • THis may helps you, http://stackoverflow.com/questions/1201194/php-getting-domain-name-from-subdomain –  Feb 07 '12 at 11:59
  • You need a list of tlds. Without this list, domain names such as www.bbc.co.uk are ambiguous (www, bbc.co, uk or www, bbc, co.uk). – Salman A Feb 07 '12 at 12:07
  • You can use this API endpoint to get it correct tld and all other details abou the URL: https://www.geekystats.com/api/v1/urlDetails?url=https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/slice – Lukas Liesis Jun 18 '17 at 19:02

8 Answers8

7

Just use the PHP Explode Function with a limit of two.

Example 1:

var_dump(explode('.','example.com',2));

Example 1 Result:

array(2) { [0]=> string(7) "example" [1]=> string(3) "com" }

Example 2:

var_dump(explode('.','example.uk.com',2));

Example 2 Result:

array(2) { [0]=> string(7) "example" [1]=> string(6) "uk.com" }
Roy Shoa
  • 3,117
  • 1
  • 37
  • 41
4

Just as you said, you can use $urlCompontents=parseUrl($url) to get the hostname. Then you could use explode(".",$urlCompontents["host"]) to split the hostname into the different parts, e.G. array("example","co","uk"). You'll have to do the rest by comparing the parts against a list, because there is no fixed rule that e.G. "uk" by itself is not considered a TLD but "co.uk" is. But you don't need any regular expressions here.

Simon
  • 3,509
  • 18
  • 21
2

Here is what I use. Hope it helps.

function extractTLD( $domain )
{
    $productTLD = '';
    $tempstr = explode(".", $domain);
    unset($tempstr[0]);
    foreach($tempstr as $value){
        $productTLD = $productTLD.".".$value;
    }    
    return $productTLD;
}
  • this is badly written, by removing the first part of the domain you are always expecting a www. or other subdomain, this does not cater for domain.com formats – on_ May 02 '14 at 13:20
2

Use parse_url($url,PHP_URL_HOST) to get the host name; then use the function below to split the domain into parts:

function split_domain($host,$SLDs='co|com|edu|gov|mil|net|org')
{
    $parts=explode('.',$host);
    $index=count($parts)-1;
    if($index>0 && in_array($parts[$index-1],explode('|',$SLDs))) $index--;
    if($index===0) $index++;
    $subdomain=implode('.',array_slice($parts,0,$index-1));
    $domain=$parts[$index-1];
    $tld=implode('.',array_slice($parts,$index));
    return array($subdomain,$domain,$tld);
}
raugfer
  • 1,844
  • 19
  • 19
0

Below code will split (explode) the host string on '.' character. A simple exception array of tld's is needed and I put already co.uk in it. And only for these exceptions it will use the last two chunks of the host name.

$h='something.co.uk';
$x=array('uk'=>'co'); // exceptions of tld's with 2 parts
$r=explode('.',$h); // split host on dot
$t=array_pop($r); // create tld
if(isset($x[$t]) and end($r)==$x[$t]) $t=array_pop($r).'.'.$t; // add to tld for the exceptions
$d=implode('.',$r); // domain
echo "sld:$d, tld:$t";

The result is sld:something, tld:co.uk

Bob Siefkes
  • 1,133
  • 9
  • 11
0
$pos = strpos('domain.com', '.');
$length= strlen('domain.com');
$domain = substr('domain.com', 0, $pos);
$tld= substr('domain.com', $pos, $length);
grasshopper
  • 1,381
  • 4
  • 19
  • 36
0

Just in case someone needs to get an updated list of valid TLD's: http://data.iana.org/TLD/tlds-alpha-by-domain.txt

Christian K.
  • 528
  • 6
  • 16
  • It's ridiculous that the IANA allows an arbitrary number of nth-level domains, like example.co.uk, as well as any number of subdomains like a.b.example, because since you can't parse from the left or right, there's no simple, reliable way to ever know the actual "primary" portion of the domain. You could have "amazon.co, amazon.uk, amazon.co.uk, amazon.com.co.uk"... The only way is to have a script that always checks the live list of domains. :-/ – Beejor May 11 '19 at 00:05
  • That list is wrong, why not link the officil list from publicsuffix.org? https://publicsuffix.org/list/public_suffix_list.dat – Shardj Jul 29 '19 at 15:33
0

Split the string on . characters (no need to regex), then work through the resulting array from the end.

You'll need to manually keep an index of which SLDs are sold directly to end users as there is no simple pattern that describes them accurately.

Keep in mind that there is likely to be an influx of new TLDs.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335