8

So let's say I have just-a.domain.com,just-a-domain.info,just.a-domain.net how can I remove the extension .com,.net.info ... and I need the resultes in two variables one with the domain name and another one with the extension.

I tried with str_replace but doesn't work, I guess only with regex....

hakre
  • 193,403
  • 52
  • 435
  • 836
Uffo
  • 9,628
  • 24
  • 90
  • 154

5 Answers5

12
  preg_match('/(.*?)((?:\.co)?.[a-z]{2,4})$/i', $domain, $matches);

$matches[1] will have the domain and $matches[2] will have the extension

<?php

$domains = array("google.com", "google.in", "google.co.in", "google.info", "analytics.google.com");

foreach($domains as $domain){
  preg_match('/(.*?)((?:\.co)?.[a-z]{2,4})$/i', $domain, $matches);
  print_r($matches);
}
?>

Will produce the output

Array
(
    [0] => google.com
    [1] => google
    [2] => .com
)
Array
(
    [0] => google.in
    [1] => google
    [2] => .in
)
Array
(
    [0] => google.co.in
    [1] => google
    [2] => .co.in
)
Array
(
    [0] => google.info
    [1] => google
    [2] => .info
)
Array
(
    [0] => analytics.google.com
    [1] => analytics.google
    [2] => .com
)
Joyce Babu
  • 19,602
  • 13
  • 62
  • 97
10
$subject = 'just-a.domain.com';
$result = preg_split('/(?=\.[^.]+$)/', $subject);

This produces the following array

$result[0] == 'just-a.domain';
$result[1] == '.com';
HamZa
  • 14,671
  • 11
  • 54
  • 75
splash
  • 13,037
  • 1
  • 44
  • 67
8

If you want to remove the part of the domain that is administrated by domain name registrars, you will need to use a list of such suffixes like the Public Suffix List.

But since a walk through this list and testing the suffix on the domain name is not that efficient, rather use this list only to build an index like this:

$tlds = array(
    // ac : http://en.wikipedia.org/wiki/.ac
    'ac',
    'com.ac',
    'edu.ac',
    'gov.ac',
    'net.ac',
    'mil.ac',
    'org.ac',
    // ad : http://en.wikipedia.org/wiki/.ad
    'ad',
    'nom.ad',
    // …
);
$tldIndex = array_flip($tlds);

Searching for the best match would then go like this:

$levels = explode('.', $domain);
for ($length=1, $n=count($levels); $length<=$n; ++$length) {
    $suffix = implode('.', array_slice($levels, -$length));
    if (!isset($tldIndex[$suffix])) {
        $length--;
        break;
    }
}
$suffix = implode('.', array_slice($levels, -$length));
$prefix = substr($domain, 0, -strlen($suffix) - 1);

Or build a tree that represents the hierarchy of the domain name levels as follows:

$tldTree = array(
    // ac : http://en.wikipedia.org/wiki/.ac
    'ac' => array(
        'com' => true,
        'edu' => true,
        'gov' => true,
        'net' => true,
        'mil' => true,
        'org' => true,
     ),
     // ad : http://en.wikipedia.org/wiki/.ad
     'ad' => array(
        'nom' => true,
     ),
     // …
);

Then you can use the following to find the match:

$levels = explode('.', $domain);
$r = &$tldTree;
$length = 0;
foreach (array_reverse($levels) as $level) {
    if (isset($r[$level])) {
        $r = &$r[$level];
        $length++;
    } else {
        break;
    }
}
$suffix = implode('.', array_slice($levels, - $length));
$prefix = substr($domain, 0, -strlen($suffix) - 1);
Gumbo
  • 643,351
  • 109
  • 780
  • 844
2

Regex and parse_url() aren't solution for you.

You need package that uses Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.). I recomend use TLD Extract.

Here example of code:

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('just.a-domain.net');
$result->getSubdomain(); // will return (string) 'just'
$result->getHostname(); // will return (string) 'a-domain'
$result->getSuffix(); // will return (string) 'net'
$result->getRegistrableDomain(); // will return (string) 'a-domain.net'
Muhammad Hassaan
  • 7,296
  • 6
  • 30
  • 50
Oleksandr Fediashov
  • 4,315
  • 1
  • 24
  • 42
-1
strrpos($str, ".")

Will give you the index for the last period in your string, then you can use substr() with the index and return the short string.

Ólafur Waage
  • 68,817
  • 22
  • 142
  • 198