50

I need a php function which produce a pure domain name from URL. So this function must be remove http://, www and /(slash) parts from URL if these parts exists. Here is example input and outputs: Input - > http://www.google.com/ | Output -> google.com
Input - > http://google.com/ | Output -> google.com
Input - > www.google.com/ | Output -> google.com
Input - > google.com/ | Output -> google.com
Input - > google.com | Output -> google.com

I checked parse_url function, but doesn't return what I need. Since, I'm beginner in PHP, it was difficult for me. If you have any idea, please answer.
Thanx in advance.

hakre
  • 193,403
  • 52
  • 435
  • 836
JohnUS
  • 1,001
  • 4
  • 11
  • 13

9 Answers9

78
$input = 'www.google.co.uk/';

// in case scheme relative URI is passed, e.g., //www.google.com/
$input = trim($input, '/');

// If scheme not included, prepend it
if (!preg_match('#^http(s)?://#', $input)) {
    $input = 'http://' . $input;
}

$urlParts = parse_url($input);

// remove www
$domain = preg_replace('/^www\./', '', $urlParts['host']);

echo $domain;

// output: google.co.uk

Works correctly with all your example inputs.

webbiedave
  • 48,414
  • 8
  • 88
  • 101
  • 3
    @Gordon: Question stated `must be remove http://, www and /`. Arbitrary subdomains were not part of it and wouldn't work anyways if he needed it to. – webbiedave Feb 20 '12 at 16:24
33
$str = 'http://www.google.com/';
$str = preg_replace('#^https?://#', '', rtrim($str,'/'));
echo $str; // www.google.com
Mahdi
  • 461
  • 4
  • 6
10

There are lots of ways grab the domain out of a url I've posted 4 ways below starting from the shortest to the longest.

#1

function urlToDomain($url) {
   return implode(array_slice(explode('/', preg_replace('/https?:\/\/(www\.)?/', '', $url)), 0, 1));
}
echo urlToDomain('http://www.example.com/directory/index.php?query=true');

#2

function urlToDomain($url) {
   $domain = explode('/', preg_replace('/https?:\/\/(www\.)?/', '', $url));
   return $domain['0'];
}
echo urlToDomain('http://www.example.com/directory/index.php?query=true');

#3

function urlToDomain($url) {
   $domain = preg_replace('/https?:\/\/(www\.)?/', '', $url);
   if ( strpos($domain, '/') !== false ) {
      $explode = explode('/', $domain);
      $domain  = $explode['0'];
   }
   return $domain;
}
echo urlToDomain('http://www.example.com/directory/index.php?query=true');

#4

function urlToDomain($url) {
   if ( substr($url, 0, 8) == 'https://' ) {
      $url = substr($url, 8);
   }
   if ( substr($url, 0, 7) == 'http://' ) {
      $url = substr($url, 7);
   }
   if ( substr($url, 0, 4) == 'www.' ) {
      $url = substr($url, 4);
   }
   if ( strpos($url, '/') !== false ) {
      $explode = explode('/', $url);
      $url     = $explode['0'];
   }
   return $url;
}
echo urlToDomain('http://www.example.com/directory/index.php?query=true');

All of the functions above return the same response: example.com

TURTLE
  • 3,728
  • 4
  • 49
  • 50
6

Try this, it will remove what you wanted (http:://, www and trailing slash) but will retain other subdomains such as example.google.com

$host = parse_url('http://www.google.com', PHP_URL_HOST);
$host = preg_replace('/^(www\.)/i', '', $host);

Or as a one-liner:

$host = preg_replace('/^(www\.)/i', '', parse_url('http://www.google.com', PHP_URL_HOST));
Pikamander2
  • 7,332
  • 3
  • 48
  • 69
h00ligan
  • 1,471
  • 9
  • 17
  • The OP specifically asked to have http://, www and trailing slashed removed therefore my solution only removes these. Other solutions could be a lot trickier and would probably need a database of exceptions, .uk, .tw domains would for example cause problems. – h00ligan Feb 20 '12 at 16:22
3
if (!preg_match('/^http(s)?:\/\//', $url))
    $url = 'http://' . $url;

$host = parse_url($url, PHP_URL_HOST);
$host = explode('.', strrev($host));
$host = strrev($host[1]) . '.' strrev($host[0]);

This would return second level domain, though it would be useless for say .co.uk domains, so you might want to do some more checking, and include additional parts if strrev($host[0]) is uk, au, etc.

gintas
  • 2,118
  • 1
  • 18
  • 28
1
$value = 'https://google.ca';
$result = str_ireplace('www.', '', parse_url($value, PHP_URL_HOST));
// google.ca
stardust4891
  • 2,390
  • 1
  • 18
  • 30
1

This will account for "http/https", "www" and the ending slash

$str = 'https://www.google.com/';
$str = preg_replace('#(^https?:\/\/(w{3}\.)?)|(\/$)#', '', $str);
echo $str; // google.com

Just ask if you need help understanding the regex.

Valeri
  • 327
  • 1
  • 5
  • 15
A. Dady
  • 143
  • 1
  • 8
1

First way is to use one regular expression to trim unnecesary parts of URL like protocol, www and ending slash

function trimUrlProtocol($url) {
    return preg_replace('/((^https?:\/\/)?(www\.)?)|(\/$)/', '', trim($url));
}

echo trimUrlProtocol('http://sandbox.onlinephpfunctions.com/') . PHP_EOL;
echo trimUrlProtocol('https://sandbox.onlinephpfunctions.com/') . PHP_EOL;
echo trimUrlProtocol('http://www.sandbox.onlinephpfunctions.com/') . PHP_EOL;
echo trimUrlProtocol('https://www.sandbox.onlinephpfunctions.com/') . PHP_EOL;
echo trimUrlProtocol('http://sandbox.onlinephpfunctions.com') . PHP_EOL;
echo trimUrlProtocol('https://sandbox.onlinephpfunctions.com') . PHP_EOL;
echo trimUrlProtocol('http://www.sandbox.onlinephpfunctions.com') . PHP_EOL;
echo trimUrlProtocol('https://www.sandbox.onlinephpfunctions.com') . PHP_EOL;
echo trimUrlProtocol('sandbox.onlinephpfunctions.com') . PHP_EOL;

By alternative way you can use parse_url, but you have to make additional cheks to check if host part exists and then use regular expression to trim www. Just use first way, it is simple and lazy.

Profesor08
  • 1,181
  • 1
  • 13
  • 20
0

Use parse_url

http://www.php.net/manual/en/function.parse-url.php

matzino
  • 3,544
  • 1
  • 18
  • 37