8

I have an array of domains like this:

domain.com
second.com
www.third.com
www.fourth.fifth.com
sixth.com
seventh.eigth.com

what I want is a function to return me the host only. Without subdomain.

This code is what i have so far for getting the hostname:

$parse = parse_url($url);
$domain = $parse['host'];

But this returns this only:

domain.com
second.com
third.com
fourth.fifth.com
sixth.com
seventh.eigth.com

I would need this output though:

domain.com
second.com
third.com
fifth.com
sixth.com
eigth.com
ItsMeDom
  • 540
  • 3
  • 5
  • 18

9 Answers9

8

Try this code

 <?php
    /**
    * @param string $domain Pass $_SERVER['SERVER_NAME'] here
    * @param bool $debug
    *
    * @debug bool $debug
    * @return string
    */
    function get_domain($domain, $debug = false) {
        $original = $domain = strtolower($domain);     
        if (filter_var($domain, FILTER_VALIDATE_IP)) { return $domain; }    

        $debug ? print('<strong style="color:green">&raquo;</strong> Parsing: '.$original) : false; //DEBUG 

        $arr = array_slice(array_filter(explode('.', $domain, 4), function($value){
                            return $value !== 'www'; }), 0); //rebuild array indexes

        if (count($arr) > 2)    {
            $count = count($arr);
            $_sub = explode('.', $count === 4 ? $arr[3] : $arr[2]);

            $debug ? print(" (parts count: {$count})") : false; //DEBUG

            if (count($_sub) === 2)  { // two level TLD
                $removed = array_shift($arr);
                if ($count === 4) // got a subdomain acting as a domain
                    $removed = array_shift($arr);            
                $debug ? print("<br>\n" . '[*] Two level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false; //DEBUG
            }
            elseif (count($_sub) === 1){ // one level TLD
                $removed = array_shift($arr); //remove the subdomain             
                if (strlen($_sub[0]) === 2 && $count === 3) // TLD domain must be 2 letters
                    array_unshift($arr, $removed);                
                else{
                    // non country TLD according to IANA
                    $tlds = array(    'aero',    'arpa',    'asia',    'biz',    'cat',    'com',    'coop',    'edu',    'gov',    'info',    'jobs',    'mil',    'mobi',    'museum',    'name',    'net',    'org',    'post',    'pro',    'tel',    'travel',    'xxx',    );             
                    if (count($arr) > 2 && in_array($_sub[0], $tlds) !== false) {//special TLD don't have a country
                        array_shift($arr);
                    }
                }
                $debug ? print("<br>\n" .'[*] One level TLD: <strong>'.join('.', $_sub).'</strong> ') : false; //DEBUG
            }
            else { // more than 3 levels, something is wrong
                for ($i = count($_sub); $i > 1; $i--) 
                    $removed = array_shift($arr);

                $debug ? print("<br>\n" . '[*] Three level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false; //DEBUG
            }
        }
        elseif (count($arr) === 2) {
            $arr0 = array_shift($arr);     
            if (strpos(join('.', $arr), '.') === false
                        && in_array($arr[0], array('localhost','test','invalid')) === false) // not a reserved domain
            {
                $debug ? print("<br>\n" .'Seems invalid domain: <strong>'.join('.', $arr).'</strong> re-adding: <strong>'.$arr0.'</strong> ') : false; //DEBUG
                // seems invalid domain, restore it
                array_unshift($arr, $arr0);
            }
        }     

        $debug ? print("<br>\n".'<strong style="color:gray">&laquo;</strong> Done parsing: <span style="color:red">' . $original . '</span> as <span style="color:blue">'. join('.', $arr) ."</span><br>\n") : false; //DEBUG     
        return join('.', $arr);
    }


     //TEST
    $urls = array(
    'www.example.com' => 'example.com',
    'example.com' => 'example.com',
    'example.com.br' => 'example.com.br',
    'www.example.com.br' => 'example.com.br',
    'www.example.gov.br' => 'example.gov.br',
    'localhost' => 'localhost',
    'www.localhost' => 'localhost',
    'subdomain.localhost' => 'localhost',
    'www.subdomain.example.com' => 'example.com',
    'subdomain.example.com' => 'example.com',
    'subdomain.example.com.br' => 'example.com.br',
    'www.subdomain.example.com.br' => 'example.com.br',
    'www.subdomain.example.biz.br' => 'example.biz.br',
    'subdomain.example.biz.br' => 'example.biz.br',
    'subdomain.example.net' => 'example.net',
    'www.subdomain.example.net' => 'example.net',
    'www.subdomain.example.co.kr' => 'example.co.kr',
    'subdomain.example.co.kr' => 'example.co.kr',
    'example.co.kr' => 'example.co.kr',
    'example.jobs' => 'example.jobs',
    'www.example.jobs' => 'example.jobs',
    'subdomain.example.jobs' => 'example.jobs',
    'insane.subdomain.example.jobs' => 'example.jobs',
    'insane.subdomain.example.com.br' => 'example.com.br',
    'www.doubleinsane.subdomain.example.com.br' => 'example.com.br',
    'www.subdomain.example.jobs' => 'example.jobs',
    'test' => 'test',
    'www.test' => 'test',
    'subdomain.test' => 'test',
    'www.detran.sp.gov.br' => 'sp.gov.br',
    'www.mp.sp.gov.br' => 'sp.gov.br',
    'ny.library.museum' => 'library.museum',
    'www.ny.library.museum' => 'library.museum',
    'ny.ny.library.museum' => 'library.museum',
    'www.library.museum' => 'library.museum',
    'info.abril.com.br' => 'abril.com.br',
    '127.0.0.1' => '127.0.0.1',
    '::1' => '::1',
    );

    $failed = 0;
    $total = count($urls);

    foreach ($urls as $from => $expected){
        $from = get_domain($from, true);
        if ($from !== $expected){
            $failed++;
            print("<div style='color:fuchsia;'>expected {$from} to be {$expected}</div>");
        }
    }    
    if ($failed)    
        print("{$failed} tests failed out of {$total}");    
    else    
        print("Success");   

all credit goes to pocesar

Sanoob
  • 2,466
  • 4
  • 30
  • 38
  • Thanks shanoop, doesnt work for me because I check millions of urls... So this version would waste a lot of my scripts time... – ItsMeDom Feb 16 '14 at 23:54
  • @DoJoChi If you very sure about your million urls possibilities. Then you can choose Baptiste Donaux's one. That code will fail on domain with country TLD. This function 46 line if you remove all those testing and debug lines. – Sanoob Feb 17 '14 at 07:53
8
function giveHost($host_with_subdomain) {
    $array = explode(".", $host_with_subdomain);

    return (array_key_exists(count($array) - 2, $array) ? $array[count($array) - 2] : "").".".$array[count($array) - 1];
}
Baptiste Donaux
  • 1,300
  • 13
  • 34
  • This works fine. Thanks Baptiste! I just dont know why. Could you comment the code a bit? Especially the "echo...." line .... – ItsMeDom Feb 16 '14 at 23:51
  • 8
    As Marc B pointed out above, this will return "co.uk" for "amazon.co.uk". If you KNOW your subdomains will always have one-part top level domain name then you are fine, otherwise you will have to create list of possible multi-part top level domain names and make sure you are not stripping away something you shouldnt. – Tomas Jan 25 '16 at 14:37
8

I prefer use regex to do this, it's is more easy to understand for me:

$url = "http://www.domain.com";
$parse = parse_url($url);
echo preg_replace("/^([a-zA-Z0-9].*\.)?([a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z.]{2,})$/", '$2', $parse['host']); 
Jordi Martín
  • 509
  • 4
  • 12
  • regex is awesome and unitool! Thanks for sharing! This should be accept answer, no function or pulling files from github! Wouldn't `[a-zA-Z0-9][a-zA-Z0-9-]+` be more elegant than `[a-zA-Z0-9][a-zA-Z0-9-]{1,61}` though? – Alex.Default Aug 29 '18 at 11:56
  • 2
    This is by far the simplest method here. But there is one small bug. For example, your answer correctly parses urls with .co.uk at the end, but there are some with like .com.au at the end and this regex doesn't parse it correctly in that case. – AwesomeGuy Sep 17 '18 at 06:55
  • Kudos to this one, definitily the most efficient. – Matt Loye Apr 08 '20 at 09:03
  • Only one which works with `www.bbc.co.uk` – John Magnolia Nov 24 '22 at 15:15
5

Here's one that works for all domains, including those with second level domains like "co.uk"

function strip_subdomains($url){

    # credits to gavingmiller for maintaining this list
    $second_level_domains = file_get_contents("https://raw.githubusercontent.com/gavingmiller/second-level-domains/master/SLDs.csv");

    # presume sld first ...
    $possible_sld = implode('.', array_slice(explode('.', $url), -2));

    # and then verify it
    if (strpos($second_level_domains, $possible_sld)){
        return  implode('.', array_slice(explode('.', $url), -3));
    } else {
        return  implode('.', array_slice(explode('.', $url), -2));
    }
}
ShadowTrackr
  • 111
  • 1
  • 3
1

I was using Baptiste Donaux's version until needed to check 'localhost'. I think Shanoop's version is more reliable.

I've tested both versions in a testsuite with 190 assertions and there's no huge impact on performance. If still milisseconds are a concern, you can just cache the results in production using Redis or something similar.

This is the same version of Shanoop's answer, but without the debug lines and with a bit cleanup:

function stripSubdomain($domain) 
{
    $domain = strtolower($domain);
    if (isIp($domain)) ? return $domain;
    return stripArray( buildArray($domain) );
}

function isIp($domain)
{
    return (filter_var($domain, FILTER_VALIDATE_IP));
}

function buildArray($domain)
{
    return array_slice(array_filter(explode('.', $domain, 4), function($value){
                                                                  return $value !== 'www';
                                                              }), 0);
}

function stripArray($arr)
{
    // TLD Domains
    if (count($arr) > 2) {
        $count = count($arr);
        $_sub = $this->retrieveSubdomain($arr);

        // two level TLD
        if (count($_sub) === 2)  {
            array_shift($arr);
            if ($count === 4) array_shift($arr);
        }

        // one level TLD
        elseif (count($_sub) === 1){
            $removed = array_shift($arr);
            if (strlen($_sub[0]) === 2 && $count === 3) array_unshift($arr, $removed);

            else {
                // non country TLD according to IANA
                $tlds = ['aero', 'arpa', 'asia', 'biz', 'cat', 'com', 'coop',
                         'edu', 'gov', 'info', 'jobs', 'mil', 'mobi', 'museum',
                         'name', 'net', 'org', 'post', 'pro', 'tel', 'travel', 'xxx'];

                if (count($arr) > 2 &&
                    in_array($_sub[0], $tlds) !== false) array_shift($arr);
            }

        }

        // more than 3 levels, something is wrong
        else
            for ($i = count($_sub); $i > 1; $i--) array_shift($arr);

    }

    // Special Domains
    elseif (count($arr) === 2) {
        $removed = array_shift($arr);
        $reserved = ['localhost','test','invalid'];
        if (strpos(join('.', $arr), '.') === false && in_array($arr[0], $reserved) === false)
            array_unshift($arr, $removed); // seems invalid domain, restore it
    }

    return join('.', $arr);
}

function retrieveSubdomain($arr)
{
    return explode('.', (count($arr) === 4 ? $arr[3] : $arr[2]) );
}
Rafael Beckel
  • 2,199
  • 5
  • 25
  • 36
  • I've actually implemented it inside a class. if you do so, make only the first method as public (and call other methods with $this->method). Also, there are possible improvements to be done, like getting the domain from the class constructor and just calling $domain = new Domain('url.here'); and then $domain->stripSubdomain(); – Rafael Beckel Jul 21 '15 at 11:31
0

This one works good for the majority of domains, and elegant if I must say so myself!

public static function StripSubdomain($Domain) {

    $domain_array = explode('.', $Domain);
    $domain_array = array_reverse($domain_array);

    return $domain_array[1] . '.' . $domain_array[0];
}
Jeffrey L. Roberts
  • 2,844
  • 5
  • 34
  • 69
0

Try this out. I like the simplicity and it's been working for me for most use-cases.

$domains = [
    "domain.com",
    "second.com",
    "www.third.com",
    "www.fourth.fifth.com",
    "openhours.colyn.dev"
];

$domains = array_map(function ($domain) {
    $parts = explode('.', $domain);
    return implode('.', array_slice($parts, count($parts)-2));
}, $domains);

/**
[
    "domain.com",
    "second.com",
    "third.com",
    "fifth.com",
    "colyn.dev",
]
*/

Not super robust but for me, it works fine.

Colyn Brown
  • 508
  • 1
  • 5
  • 11
0

I had problems with some of the other solutions as they didn't work with '.co.uk' or '.com.au'

I decided to explode the domain by . and reverse it, and keep building the domain as long as the 'part' was not longer than 3 chars long. There are some TLD's with parts longer than 4 such as mobi.ke but for my use case these rare ones dont matter.

    private function removeSubDomain(String $host): string
{
    $host_parts = explode('.', $host);
    $host_parts = array_reverse($host_parts);

    $domain = '';
    foreach ($host_parts as $part) {
        $domain = '.' . $part . $domain;
        if (strlen($part) > 3) {
            $domain = ltrim($domain, '.');
            break;
        }
    }

    return $domain;

}
-4

Try with str_replace();

$parse = parse_url($url);
$domain = str_replace('www.','',$parse['host']);
Angel Politis
  • 10,955
  • 14
  • 48
  • 66
ZiupeX
  • 338
  • 3
  • 13