3

I know there is a LOT of info on the web regarding to this subject but I can't seem to figure it out the way I want.

I'm trying to build a function which strips the domain name from a url:

http://blabla.com    blabla
www.blabla.net       blabla
http://www.blabla.eu blabla

Only the plain name of the domain is needed.

With parse_url I get the domain filtered but that is not enough. I have 3 functions that stips the domain but still I get some wrong outputs

function prepare_array($domains)
{
    $prep_domains = explode("\n", str_replace("\r", "", $domains)); 
    $domain_array = array_map('trim', $prep_domains); 

    return $domain_array;
}

function test($domain) 
{
    $domain = explode(".", $domain);
    return $domain[1];
}

function strip($url) 
{ 
   $url = trim($url);
   $url = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url); 
   $url = preg_replace("/\/.*$/is" , "" ,$url); 
   return $url; 
}

Every possible domain, url and extension is allowed. After the function is finished, it must return a array of only the domain names itself.

UPDATE: Thanks for all the suggestions!

I figured it out with the help from you all.

function test($url) 
{   
    // Check if the url begins with http:// www. or both
    // If so, replace it
    if (preg_match("/^(http:\/\/|www.)/i", $url))
    {
        $domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
    }
    else
    {
        $domain = $url;
    }

    // Now all thats left is the domain and the extension
    // Only return the needed first part without the extension    
    $domain = explode(".", $domain);

    return $domain[0];
}
Rob
  • 127
  • 1
  • 2
  • 12

5 Answers5

3

How about

$wsArray = explode(".",$domain); //Break it up into an array. 
$extension = array_pop($wsArray); //Get the Extension (last entry)
$domain = array_pop($wsArray); // Get the domain

http://php.net/manual/en/function.array-pop.php

Jim_M
  • 273
  • 1
  • 2
  • 10
  • Actually ChoiZ comment above is probably the better answer. – Jim_M Sep 17 '15 at 19:22
  • [This answer does not work on `.co.uk` and similar domains.](https://ideone.com/uyrYdv) – Zsw Sep 17 '15 at 19:35
  • Unfortunately this doesn't do the job. When I input http://google.com, it returns http://google – Rob Sep 18 '15 at 17:49
  • I gave a new solution last night. I just want to be sure from your comment though. It originally looked like you were trying to extract just the name "blablabla" from any tld. But now your saying if you input "google.com" it just gives you google. Is that not the result your looking for ? – Jim_M Sep 18 '15 at 18:20
  • Stackoverflow changed the original content which was: "http://google.com" "google.com", still not the output I typed – Rob Sep 18 '15 at 19:14
2

Ah, your problem lies in the fact that TLDs can be either in one or two parts e.g .com vs .co.uk.

What I would do is maintain a list of TLDs. With the result after parse_url, go over the list and look for a match. Strip out the TLD, explode on '.' and the last part will be in the format you want it.

This does not seem as efficient as it could be but, with TLDs being added all the time, I cannot see any other deterministic way.

James Dunne
  • 680
  • 6
  • 9
2

Ok...this is messy and you should spend some time optimizing and caching previously derived domains. You should also have a friendly NameServer and the last catch is the domain must have a "A" record in their DNS.

This attempts to assemble the domain name in reverse order until it can resolve to a DNS "A" record.

At anyrate, this was bugging me, so I hope this answer helps :

<?php
$wsHostNames = array(
    "test.com",
    "http://www.bbc.com/news/uk-34276525",
    "google.uk.co"
);
foreach ($wsHostNames as $hostName) {
    echo "checking $hostName" . PHP_EOL;
    $wsWork = $hostName;
    //attempt to strip out full paths to just host
    $wsWork = parse_url($hostName, PHP_URL_HOST);
    if ($wsWork != "") {
        echo "Was able to cleanup $wsWork" . PHP_EOL;
        $hostName = $wsWork;
    } else {
        //Probably had no path info or malformed URL
        //Try to check it anyway
        echo "No path to strip from $hostName" . PHP_EOL;
    }

    $wsArray = explode(".", $hostName); //Break it up into an array.

    $wsHostName = "";
    //Build domain one segment a time probably
    //Code should be modified not to check for the first segment (.com)
    while (!empty($wsArray)) {
        $newSegment = array_pop($wsArray);
        $wsHostName = $newSegment . $wsHostName;
        echo "Checking $wsHostName" . PHP_EOL;
        if (checkdnsrr($wsHostName, "A")) {
            echo "host found $wsHostName" . PHP_EOL;
            echo "Domain is $newSegment" . PHP_EOL;
            continue(2);
        } else {
            //This segment didn't resolve - keep building
            echo "No Valid A Record for $wsHostName" . PHP_EOL;
            $wsHostName = "." . $wsHostName;
        }
    }
    //if you get to here in the loop it could not resolve the host name

}
?>
Jim_M
  • 273
  • 1
  • 2
  • 10
1

try with preg_replace.

something like $domain = preg_replace($regex, '$1', $url);

regex

Community
  • 1
  • 1
luis martinez
  • 62
  • 1
  • 9
  • This doesn't answer the question because the regex provided in the link does not have any capturing groups. – Zsw Sep 17 '15 at 19:55
1
function test($url) 
{   
    // Check if the url begins with http:// www. or both
    // If so, replace it
    if (preg_match("/^(http:\/\/|www.)/i", $url))
    {
        $domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
    }
    else
    {
        $domain = $url;
    }

    // Now all thats left is the domain and the extension
    // Only return the needed first part without the extension    
    $domain = explode(".", $domain);

    return $domain[0];
}
Rob
  • 127
  • 1
  • 2
  • 12