185

I need to build a function which parses the domain from a URL.

So, with

http://google.com/dhasjkdas/sadsdds/sdda/sdads.html

or

http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html

it should return google.com

with

http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html

it should return google.co.uk.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
zuk1
  • 18,009
  • 21
  • 59
  • 63

20 Answers20

382

Check out parse_url():

$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'google.com'

parse_url doesn't handle really badly mangled urls very well, but is fine if you generally expect decent urls.

Owen
  • 82,995
  • 21
  • 120
  • 115
  • 40
    One thing parse_url() does not do is only return the domain. If you add www.google.com or www.google.co.uk, it will return the host as well. Any suggestions for that? – Gavin M. Roy Dec 30 '08 at 00:40
  • 1
    @Crad, http://stackoverflow.com/questions/8272805/how-to-handle-mozillas-top-domain-name-list-with-php – ilhan Nov 25 '11 at 21:22
  • 7
    `parse_url` do not handle subdomains, but Purl does: https://github.com/jwage/purl – Damien Jan 18 '13 at 11:48
  • 1
    [`parse_url()`](http://php.net/manual/en/function.parse-url.php) would possibly parse URLs with a domain that contains dashes wrongly. Could not find definite proof, but check out [this bug](https://bugs.php.net/bug.php?id=51192). `FILTER_VALIDATE_URL` uses `parse_url()` internally. – XedinUnknown Jul 01 '15 at 09:16
  • 11
    Or simply: `print parse_url($url, PHP_URL_HOST))` if you don't need the `$parse` array for anything else. – rybo111 Aug 24 '16 at 12:03
  • Does not handle `domain.eu` which is perfectly valid domain. – tftd Jun 20 '17 at 15:21
  • 1
    @tftd - That's because `parse_url` expects to be passed a [**url**](https://wikipedia.org/wiki/URL), but `domain.eu` is only the domain-name/hostname portion of the url. "`parse_url` parses a URL and returns an associative array containing any of the various components of the URL that are present. This function is **not meant to validate** the given URL..." – ashleedawg Dec 19 '18 at 21:42
  • Single line: `parse_url('https://google.com/cheezburger')['host'];` – Eric P Jun 29 '21 at 18:08
114
$domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST));

This would return the google.com for both http://google.com/... and http://www.google.com/...

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • 19
    'cause it will still return the server if you put in "server.google.com" or "www3.google.com"... – patrick Dec 13 '12 at 12:12
  • 2
    Not all subdomains are www, crawl-66-249-66-1.googlebot.com, myblog.blogspot.com are a few examples. – rafark Dec 15 '19 at 20:43
25

From http://us3.php.net/manual/en/function.parse-url.php#93983

for some odd reason, parse_url returns the host (ex. example.com) as the path when no scheme is provided in the input url. So I've written a quick function to get the real host:

function getHost($Address) { 
   $parseUrl = parse_url(trim($Address)); 
   return trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2))); 
} 

getHost("example.com"); // Gives example.com 
getHost("http://example.com"); // Gives example.com 
getHost("www.example.com"); // Gives www.example.com 
getHost("http://example.com/xyz"); // Gives example.com 
philfreo
  • 41,941
  • 26
  • 128
  • 141
17
function get_domain($url = SITE_URL)
{
    preg_match("/[a-z0-9\-]{1,63}\.[a-z\.]{2,6}$/", parse_url($url, PHP_URL_HOST), $_domain_tld);
    return $_domain_tld[0];
}

get_domain('http://www.cdl.gr'); //cdl.gr
get_domain('http://cdl.gr'); //cdl.gr
get_domain('http://www2.cdl.gr'); //cdl.gr
nikmauro
  • 683
  • 7
  • 13
  • No working for me either: example.com // Incorrect: empty string http://example.com // Correct: example.com www.example.com // Incorrect: empty string http://example.com/xyz // Correct: example.com – jenlampton Nov 26 '16 at 21:18
  • 2
    This is a great answer and deserves more credit. Just add this line as the first line in the function and it also solves the problems of MangeshSathe and jenlampton: if((substr($url,0,strlen('http://')) <> 'http://') && (substr($url,0,strlen('https://')) <> 'https://')) $url = 'http://'.$url; – Rick Jul 07 '19 at 13:31
15

The code that was meant to work 100% didn't seem to cut it for me, I did patch the example a little but found code that wasn't helping and problems with it. so I changed it out to a couple of functions (to save asking for the list from Mozilla all the time, and removing the cache system). This has been tested against a set of 1000 URLs and seemed to work.

function domain($url)
{
    global $subtlds;
    $slds = "";
    $url = strtolower($url);

    $host = parse_url('http://'.$url,PHP_URL_HOST);

    preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
    foreach($subtlds as $sub){
        if (preg_match('/\.'.preg_quote($sub).'$/', $host, $xyz)){
            preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
        }
    }

    return @$matches[0];
}

function get_tlds() {
    $address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
    $content = file($address);
    foreach ($content as $num => $line) {
        $line = trim($line);
        if($line == '') continue;
        if(@substr($line[0], 0, 2) == '/') continue;
        $line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
        if($line == '') continue;  //$line = '.'.$line;
        if(@$line[0] == '.') $line = substr($line, 1);
        if(!strstr($line, '.')) continue;
        $subtlds[] = $line;
        //echo "{$num}: '{$line}'"; echo "<br>";
    }

    $subtlds = array_merge(array(
            'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk', 
            'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
            'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au'
        ), $subtlds);

    $subtlds = array_unique($subtlds);

    return $subtlds;    
}

Then use it like

$subtlds = get_tlds();
echo domain('www.example.com') //outputs: example.com
echo domain('www.example.uk.com') //outputs: example.uk.com
echo domain('www.example.fr') //outputs: example.fr

I know I should have turned this into a class, but didn't have time.

7ochem
  • 2,183
  • 1
  • 34
  • 42
Shaun
  • 159
  • 1
  • 2
11

Please consider replacring the accepted solution with the following:

parse_url() will always include any sub-domain(s), so this function doesn't parse domain names very well. Here are some examples:

$url = 'http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'www.google.com'

echo parse_url('https://subdomain.example.com/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.com

echo parse_url('https://subdomain.example.co.uk/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.co.uk

Instead, you may consider this pragmatic solution. It will cover many, but not all domain names -- for instance, lower-level domains such as 'sos.state.oh.us' are not covered.

function getDomain($url) {
    $host = parse_url($url, PHP_URL_HOST);

    if(filter_var($host,FILTER_VALIDATE_IP)) {
        // IP address returned as domain
        return $host; //* or replace with null if you don't want an IP back
    }

    $domain_array = explode(".", str_replace('www.', '', $host));
    $count = count($domain_array);
    if( $count>=3 && strlen($domain_array[$count-2])==2 ) {
        // SLD (example.co.uk)
        return implode('.', array_splice($domain_array, $count-3,3));
    } else if( $count>=2 ) {
        // TLD (example.com)
        return implode('.', array_splice($domain_array, $count-2,2));
    }
}

// Your domains
    echo getDomain('http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
    echo getDomain('http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
    echo getDomain('http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html'); // google.co.uk

// TLD
    echo getDomain('https://shop.example.com'); // example.com
    echo getDomain('https://foo.bar.example.com'); // example.com
    echo getDomain('https://www.example.com'); // example.com
    echo getDomain('https://example.com'); // example.com

// SLD
    echo getDomain('https://more.news.bbc.co.uk'); // bbc.co.uk
    echo getDomain('https://www.bbc.co.uk'); // bbc.co.uk
    echo getDomain('https://bbc.co.uk'); // bbc.co.uk

// IP
    echo getDomain('https://1.2.3.45');  // 1.2.3.45

Finally, Jeremy Kendall's PHP Domain Parser allows you to parse the domain name from a url. League URI Hostname Parser will also do the job.

patrick
  • 11,519
  • 8
  • 71
  • 80
Kristoffer Bohmann
  • 3,986
  • 3
  • 28
  • 35
6

If you want extract host from string http://google.com/dhasjkdas/sadsdds/sdda/sdads.html, usage of parse_url() is acceptable solution for you.

But if you want extract domain or its parts, you need package that using Public Suffix List. Yes, you can use string functions arround parse_url(), but it will produce incorrect results sometimes.

I recomend TLDExtract for domain parsing, here is sample code that show diff:

$extract = new LayerShifter\TLDExtract\Extract();

# For 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'

$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';

parse_url($url, PHP_URL_HOST); // will return google.com

$result = $extract->parse($url);
$result->getFullHost(); // will return 'google.com'
$result->getRegistrableDomain(); // will return 'google.com'
$result->getSuffix(); // will return 'com'

# For 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html'

$url = 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html';

parse_url($url, PHP_URL_HOST); // will return 'search.google.com'

$result = $extract->parse($url);
$result->getFullHost(); // will return 'search.google.com'
$result->getRegistrableDomain(); // will return 'google.com'
Oleksandr Fediashov
  • 4,315
  • 1
  • 24
  • 42
  • Thank you so much for this suggestion. I hate adding another library for what _appears_ to be a simple task, but then I saw this quote on their readme applied to me: "Everybody gets this wrong. Splitting on the '.' and taking the last 2 elements goes a long way only if you're thinking of simple e.g. .com domains. Think parsing http://forums.bbc.co.uk for example: the naive splitting method above will give you 'co' as the domain and 'uk' as the TLD, instead of 'bbc' and 'co.uk' respectively." – Demonslay335 Jan 01 '17 at 18:58
  • The result for splitting dots while not what we want to happen on our beloved .co.uk domains, actually is the correct result, the co being a second level with uk being the top level. Webmaster often do not realise that. – CodingInTheUK Oct 27 '17 at 22:06
5

You can pass PHP_URL_HOST into parse_url function as second parameter

$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$host = parse_url($url, PHP_URL_HOST);
print $host; // prints 'google.com'
Oleg Matei
  • 866
  • 13
  • 9
  • 2
    This is essentially the same as the answer above, however, the question is requiring the _domain_, which isn't necessarily the same as the _host_. – MrWhite Apr 25 '16 at 14:32
  • see comment above about scheme: for some odd reason, parse_url returns the host (ex. example.com) as the path when no scheme is provided in the input url. So I've written a quick function to get the real host: – jenlampton Nov 26 '16 at 21:34
5

I've found that @philfreo's solution (referenced from php.net) is pretty well to get fine result but in some cases it shows php's "notice" and "Strict Standards" message. Here a fixed version of this code.

function getHost($url) { 
   $parseUrl = parse_url(trim($url)); 
   if(isset($parseUrl['host']))
   {
       $host = $parseUrl['host'];
   }
   else
   {
        $path = explode('/', $parseUrl['path']);
        $host = $path[0];
   }
   return trim($host); 
} 
  
echo getHost("http://example.com/anything.html");           // example.com
echo getHost("http://www.example.net/directory/post.php");  // www.example.net
echo getHost("https://example.co.uk");                      // example.co.uk
echo getHost("www.example.net");                            // example.net
echo getHost("subdomain.example.net/anything");             // subdomain.example.net
echo getHost("example.net");                                // example.net
I have provided an updated code to answer the question more accurately because of the questioner also wanted to remove the 'www' part from the given 'url'

[The solution below has been updated on July 29, 2023]

function getHost($url, $accept_www=false){ 
    $URIs = parse_url(trim($url)); 
    $host = !empty($URIs['host'])? $URIs['host'] : explode('/', $URIs['path'])[0];
    return $accept_www == false? str_ireplace('www.', '', $host) : $host;  
} 
Use examples:
echo getHost("http://example.com/anything.html", 1).'<br>';           // example.com
echo getHost("http://www.example.net/directory/post.php", 1).'<br>';  // www.example.net
echo getHost("https://example.co.uk", 1).'<br>';                      // example.co.uk
echo getHost("www.example.net", 1).'<br>';                            // example.net
echo getHost("subdomain.example.net/anything", 1).'<br>';             // subdomain.example.net
echo getHost("http://blog.example.net/anything").'<br>';              // blog.example.net
echo getHost("example.net", 1).'<br>';                                // example.net

echo '<br> ===== without "www" ===== <br><br>';

echo getHost("http://example.com/anything.html").'<br>';             // example.com
echo getHost("http://www.example.net/directory/post.php").'<br>';    // example.net
echo getHost("https://example.co.uk").'<br>';                        // example.co.uk
echo getHost("www.example.net").'<br>';                              // example.net
echo getHost("subdomain.example.net/anything").'<br>';               // subdomain.example.net
echo getHost("http://blog.example.net/anything").'<br>';             // blog.example.net
echo getHost("example.net").'<br>';                                  // example.net
fatih
  • 330
  • 8
  • 17
4

I'm adding this answer late since this is the answer that pops up most on Google...

You can use PHP to...

$url = "www.google.co.uk";
$host = parse_url($url, PHP_URL_HOST);
// $host == "www.google.co.uk"

to grab the host but not the private domain to which the host refers. (Example www.google.co.uk is the host, but google.co.uk is the private domain)

To grab the private domain, you must need know the list of public suffixes to which one can register a private domain. This list happens to be curated by Mozilla at https://publicsuffix.org/

The below code works when an array of public suffixes has been created already. Simply call

$domain = get_private_domain("www.google.co.uk");

with the remaining code...

// find some way to parse the above list of public suffix
// then add them to a PHP array
$suffix = [... all valid public suffix ...];

function get_public_suffix($host) {
  $parts = split("\.", $host);
  while (count($parts) > 0) {
    if (is_public_suffix(join(".", $parts)))
      return join(".", $parts);

    array_shift($parts);
  }

  return false;
}

function is_public_suffix($host) {
  global $suffix;
  return isset($suffix[$host]);
}

function get_private_domain($host) {
  $public = get_public_suffix($host);
  $public_parts = split("\.", $public);
  $all_parts = split("\.", $host);

  $private = [];

  for ($x = 0; $x < count($public_parts); ++$x) 
    $private[] = array_pop($all_parts);

  if (count($all_parts) > 0)
    $private[] = array_pop($all_parts);

  return join(".", array_reverse($private));
}
Andy Jones
  • 6,205
  • 4
  • 31
  • 47
  • As per my testing, parse_url needs a well formed URL. If you just give 'www.someDomain.com/path' then it will return null. So it expects a protocols (like http or https) to be present. – Andy Feb 15 '18 at 00:28
4

Here is the code i made that 100% finds only the domain name, since it takes mozilla sub tlds to account. Only thing you have to check is how you make cache of that file, so you dont query mozilla every time.

For some strange reason, domains like co.uk are not in the list, so you have to make some hacking and add them manually. Its not cleanest solution but i hope it helps someone.

//=====================================================
static function domain($url)
{
    $slds = "";
    $url = strtolower($url);

            $address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
    if(!$subtlds = @kohana::cache('subtlds', null, 60)) 
    {
        $content = file($address);
        foreach($content as $num => $line)
        {
            $line = trim($line);
            if($line == '') continue;
            if(@substr($line[0], 0, 2) == '/') continue;
            $line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
            if($line == '') continue;  //$line = '.'.$line;
            if(@$line[0] == '.') $line = substr($line, 1);
            if(!strstr($line, '.')) continue;
            $subtlds[] = $line;
            //echo "{$num}: '{$line}'"; echo "<br>";
        }
        $subtlds = array_merge(Array(
            'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk', 
            'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
            'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au',
            ),$subtlds);

        $subtlds = array_unique($subtlds);
        //echo var_dump($subtlds);
        @kohana::cache('subtlds', $subtlds);
    }


    preg_match('/^(http:[\/]{2,})?([^\/]+)/i', $url, $matches);
    //preg_match("/^(http:\/\/|https:\/\/|)[a-zA-Z-]([^\/]+)/i", $url, $matches);
    $host = @$matches[2];
    //echo var_dump($matches);

    preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
    foreach($subtlds as $sub) 
    {
        if (preg_match("/{$sub}$/", $host, $xyz))
        preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
    }

    return @$matches[0];
}
Community
  • 1
  • 1
Luka
  • 107
  • 1
  • 7
  • The reason the domain `co.uk` was not on the list, was because it was a list of TLD's, not of domains. The ccTLD has changed a lot since this answer was written. Notably: "New registrations directly under .uk have been accepted by Nominet since 10 June 2014 08:00 BST, however there is a reservation period for existing customers who already have a .co.uk, .org.uk, .me.uk, .net.uk, .ltd.uk or .plc.uk domain to claim the corresponding .uk domain, which runs until 07:59 BST on ***10 June 2019***." (**[Source](https://wikipedia.org/wiki/.uk)**) – ashleedawg Dec 19 '18 at 22:06
3
function getTrimmedUrl($link)
{
    $str = str_replace(["www.","https://","http://"],[''],$link);
    $link = explode("/",$str);
    return strtolower($link[0]);                
}
rk3263025
  • 71
  • 1
  • 3
2
$domain = parse_url($url, PHP_URL_HOST);
echo implode('.', array_slice(explode('.', $domain), -2, 2))
Michael
  • 1,816
  • 7
  • 21
  • 35
1

parse_url didn't work for me. It only returned the path. Switching to basics using php5.3+:

$url  = str_replace('http://', '', strtolower( $s->website));
if (strpos($url, '/'))  $url = strstr($url, '/', true);
Will
  • 4,498
  • 2
  • 38
  • 65
1

I have edited for you:

function getHost($Address) { 
    $parseUrl = parse_url(trim($Address));
    $host = trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2))); 

    $parts = explode( '.', $host );
    $num_parts = count($parts);

    if ($parts[0] == "www") {
        for ($i=1; $i < $num_parts; $i++) { 
            $h .= $parts[$i] . '.';
        }
    }else {
        for ($i=0; $i < $num_parts; $i++) { 
            $h .= $parts[$i] . '.';
        }
    }
    return substr($h,0,-1);
}

All type url (www.domain.ltd, sub1.subn.domain.ltd will result to : domain.ltd.

NotFound Life
  • 39
  • 1
  • 7
1

None of this solutions worked for me when I use this test cases:

public function getTestCases(): array
{
    return [
        //input                              expected
        ['http://google.com/dhasjkdas',      'google.com'],
        ['https://google.com/dhasjkdas',     'google.com'],
        ['https://www.google.com/dhasjkdas', 'google.com'],
        ['http://www.google.com/dhasjkdas',  'google.com'],
        ['www.google.com/dhasjkdas',         'google.com'],
        ['google.com/dhasjkdas',             'google.com'],
    ];
}

but wrapping this answer into function worked in all cases: https://stackoverflow.com/a/65659814/5884988

Rawburner
  • 1,387
  • 11
  • 12
0

This will generally work very well if the input URL is not total junk. It removes the subdomain.

$host = parse_url( $Row->url, PHP_URL_HOST );
$parts = explode( '.', $host );
$parts = array_reverse( $parts );
$domain = $parts[1].'.'.$parts[0];

Example

Input: http://www2.website.com:8080/some/file/structure?some=parameters

Output: website.com

T. Brian Jones
  • 13,002
  • 25
  • 78
  • 117
0

Combining the answers of worldofjr and Alix Axel into one small function that will handle most use-cases:

function get_url_hostname($url) {

    $parse = parse_url($url);
    return str_ireplace('www.', '', $parse['host']);

}

get_url_hostname('http://www.google.com/example/path/file.html'); // google.com
Michael Giovanni Pumo
  • 14,338
  • 18
  • 91
  • 140
0

Try using URI package from The PHP League: https://github.com/thephpleague/uri

use League\Uri\UriTemplate;

$template = 'https://api.twitter.com:443/{version}/search/{term:1}/{term}/{?q*,limit}#title';
$defaultVariables = ['version' => '1.1'];
$params = [
    'term' => 'john',
    'q' => ['a', 'b'],
    'limit' => '10',
];

$uriTemplate = new UriTemplate($template, $defaultVariables);
$uri = $uriTemplate->expand($params);
// $uri is a League\Uri\Uri object

echo $uri->getScheme();
echo $uri->getHost();
echo $uri->getAuthority();
echo $uri->getPath();
echo $uri->getQuery();
echo $uri->getFragment();
echo $uri;
zoltalar
  • 119
  • 1
  • 9
-7

Just use as like following ...

<?php
   echo $_SERVER['SERVER_NAME'];
?>
  • 1
    This is assuming the server is the url you want to retrieve the domain from. That's not the case. – Overcode Jun 30 '15 at 20:42