12

I need to extract the domain name out of a string which could be anything. Such as:

$sitelink="http://www.somewebsite.com/product/3749875/info/overview.html";

or

$sitelink="http://subdomain.somewebsite.com/blah/blah/whatever.php";

In any case, I'm looking to extract the 'somewebsite.com' portion (which could be anything), and discard the rest.

j0k
  • 22,600
  • 28
  • 79
  • 90
nooblag
  • 339
  • 1
  • 5
  • 13
  • What have you tried? A simple Google search will return many answers for a popular question. – Matt Clark Feb 07 '13 at 07:02
  • 1
    Duplicate of http://stackoverflow.com/questions/2527231/extract-domain-from-url-including-the-hard-ones – idmean Feb 07 '13 at 07:04
  • 1
    possible duplicate of [Parsing Domain From URL In PHP](http://stackoverflow.com/questions/276516/parsing-domain-from-url-in-php) – j0k Feb 07 '13 at 07:08
  • http://stackoverflow.com/questions/276516/parsing-domain-from-url-in-php This page was useful. Thanks – nooblag Feb 07 '13 at 07:57

6 Answers6

23

With parse_url($url)

<?php
$url = 'http://username:password@hostname/path?arg=value#anchor';

print_r(parse_url($url));
?>

The above example will output:

Array
(
    [scheme] => http
    [host] => hostname
    [user] => username
    [pass] => password
    [path] => /path
    [query] => arg=value
    [fragment] => anchor
)

Using thos values

echo parse_url($url, PHP_URL_HOST); //hostname

or

$url_info = parse_url($url);
echo $url_info['host'];//hostname
Lawrence Cherone
  • 46,049
  • 7
  • 62
  • 106
  • I think OP wants only the second-level domain –  Feb 07 '13 at 07:07
  • Ill leave the OP to figure out how to use the array. – Lawrence Cherone Feb 07 '13 at 07:08
  • Ok, this looks promising. How do I use just the host portion of the array to turn the 'host' part into a string and ignore the rest? Thanks – nooblag Feb 07 '13 at 07:52
  • A link to the manual is the post. but ive added how you access thos values – Lawrence Cherone Feb 07 '13 at 08:00
  • This was a breeze to use. I appreciate that! Matches well in my preg_replace_callback where I am parsing image urls to display HTML image tags, and leaving the domain as text under the matched photo (image url). Thanks! – WiiLF Nov 25 '22 at 20:06
5

here it is

<?php

$sitelink="http://www.somewebsite.com/product/3749875/info/overview.html";

$domain_pieces = explode(".", parse_url($sitelink, PHP_URL_HOST));

$l = sizeof($domain_pieces);

$secondleveldomain = $domain_pieces[$l-2] . "." . $domain_pieces[$l-1];

echo $secondleveldomain;

note that this is not probably the behavior you are looking for, because, for hosts like

stackoverflow.co.uk

it will echo "co.uk"


see:

http://publicsuffix.org/learn/

http://www.dkim-reputation.org/regdom-libs/

http://www.dkim-reputation.org/regdom-lib-downloads/ <-- downloads here, php included

  • Hi thanks for the message. You're right, echoing .co.uk in cases like this wouldn't be helpful. It needs to be able to handle any domains (and strip subdomain) if possible.. Thanks anyway tho – nooblag Feb 07 '13 at 07:56
  • you can use this table as a reference https://wiki.mozilla.org/TLD_List to build what you need –  Feb 09 '13 at 06:04
4

2 complexe url

$url="https://www.example.co.uk/page/section/younameit";
or
$url="https://example.co.uk/page/section/younameit";

To get "www.example.co.uk":

$host=parse_url($url, PHP_URL_HOST);

To get "example.co.uk" only

$parts = explode('www.',$host);
$domain = $parts[1];

// ...or...

$domain = ltrim($host, 'www.')

If your url includes "www." or not you get the same end result, i.e. "example.co.uk"

Voilà!

Jabari
  • 5,359
  • 3
  • 26
  • 32
user3251285
  • 153
  • 1
  • 8
2

You need package that uses Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.) and multilevel subdomains. Regex, parse_url() or string functions will never produce absolutely correct result.

I recomend use TLD Extract. Here example of code:

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('http://www.somewebsite.com/product/3749875/info/overview.html');
$result->getSubdomain(); // will return (string) 'www'
$result->getHostname(); // will return (string) 'somewebsite'
$result->getSuffix(); // will return (string) 'com'
$result->getRegistrableDomain(); // will return (string) 'somewebsite.com'
Oleksandr Fediashov
  • 4,315
  • 1
  • 24
  • 42
0

For a string that could be anything, new approach:

function extract_plain_domain($text) {

    $text=trim($text,"/");
    $text=strtolower($text);

    $parts=explode("/",$text);
    if (substr_count($parts[0],"http")) {
        $parts[0]="";
    }
    reset ($parts);while (list ($key, $val) = each ($parts)) {
            if (!empty($val)) { $text=$val; break; }
    }

    $parts=explode(".",$text);
    if (empty($parts[2])) {
        return $parts[0].".".$parts[1];
        } else {
        $num_parts=count($parts);
        return $parts[$num_parts-2].".".$parts[$num_parts-1];
        }

} // end function extract_plain_domain
Rafa
  • 851
  • 8
  • 9
0

You can use the Utopia Domains library (https://github.com/utopia-php/domains), it will return the domain TLD and public suffix based on Mozilla public suffix list (https://publicsuffix.org), it can be used as an alternative to the currently archived TLDExtract package.

You can use 'parse_url' function to get the hostname from your URL and than use Utopia Domains parser to get the correct TLD and join it together with the domain name:

<?php

require_once './vendor/autoload.php';

use Utopia\Domains\Domain;

$url = 'http://demo.example.co.uk/site';

$domain = new Domain(parse_url($url, PHP_URL_HOST)); // demo.example.co.uk

var_dump($domain->get()); // demo.example.co.uk
var_dump($domain->getTLD()); // uk
var_dump($domain->getSuffix()); // co.uk
var_dump($domain->getName()); // example
var_dump($domain->getSub()); // demo
var_dump($domain->isKnown()); // true
var_dump($domain->isICANN()); // true
var_dump($domain->isPrivate()); // false
var_dump($domain->isTest()); // false

var_dump($domain->getName().'.'.$domain->getSuffix()); // example.co.uk
eldadfux
  • 158
  • 2
  • 7