0

I've already seen a bunch of questions on this exact subject, but none seem to solve my problem. I want to create a function that will remove everything from a website address, except for the domain name. For example if the user inputs: http://www.stackoverflow.com/blahblahblah I want to get stackoverflow, and the same way if the user inputs facebook.com/user/bacon I want to get facebook.

Do anyone know of a function or a way where I can remove certain parts of strings? Maybe it'll search for http, and when found it'll remove everything until after the // Then it'll search for www, if found it'll remove everything until the . Then it keeps everything until the next dot, where it removes everything behind it? Looking at it now, this might cause problems with sites as http://www.en.wikipedia.org because I'll be left with only en.

Any ideas (preferably in PHP, but JavaScript is also welcome)?

EDIT 1: Thanks to great feedback I think I've been able to work out a function that does what I want:

 function getdomain($url) {
    $parts = parse_url($url);
    if($parts['scheme'] != 'http') {
       $url = 'http://'.$url;
    }
    $parts2 = parse_url($url);

    $host = $parts2['host'];
    $remove = explode('.', $host);

    $result = $remove[0];
    if($result == 'www') {
       $result = $remove[1];
    }

    return $result;
 } 

It's not perfect, at least considering subdomains, but I think it's possible to do something about it. Maybe add a second if statement at the end to check the length of the array. If it's bigger than two, then choose item nr1 instead of item nr0. This obviously gives me trouble related to any domain using .co.uk (because that'll be tree items long, but I don't want to return co). I'll try to work around on it a little bit, and see what I come up with. I'd be glad if some of you PHP gurus out there could take a look as well. I'm not as skilled or as experienced as any of you... :P

Aleksander
  • 2,735
  • 5
  • 34
  • 57
  • Without answering your question directly (because it looks like an [X-Y Problem](http://www.perlmonks.org/index.pl?node_id=542341)), why don't you use the [parse_url function](http://php.net/manual/en/function.parse-url.php)? – kojiro Feb 13 '13 at 18:17
  • Okay thanks. I tried looking for answers, but only found people wanting to split let's say a string that contains something preset. I didn't know there was a function in php that did what I wanted. Thanks very much! :) – Aleksander Feb 13 '13 at 18:19
  • It'll have to be a routine that has access to a list of current [tld](https://en.wikipedia.org/wiki/Top-level_domain)'s, or [public suffix list](https://en.wikipedia.org/wiki/Public_Suffix_List), to properly analyse where the actual domain name part, you are interested in, begins. – Decent Dabbler Feb 13 '13 at 18:21

6 Answers6

1

Use parse_url to split the URL into the different parts. What you need is the hostname. Then you will want to split it by the dot and get the first part:

$url    = 'http://facebook.com/blahblah';
$parts  = parse_url($url);
$host   = $parts['host']; // facebook.com
$foo    = explode('.', $host);
$result = $foo[0]; // facebook
Gargron
  • 808
  • 7
  • 10
  • What about `mydomain.co.uk`? – Decent Dabbler Feb 13 '13 at 18:22
  • @fireeyedboy It will return "mydomain" correctly because we get the first item of the resulting (in that case 3-elements) array – Gargron Feb 13 '13 at 18:23
  • Eh, yeah, you are right, but what about `www.mydomain.co.uk`, or `mysubdomain.mydomain.co.uk` or `www.mysubdomain.mydomain.co.uk`? – Decent Dabbler Feb 13 '13 at 18:25
  • 1
    and what about subdomains? – Philipp Feb 13 '13 at 18:25
  • @fireeyedboy You are correct. You could specifically filter out "www." from the hostname, but that wouldn't work for non-standard subdomains. The only solution I can think of in that case is having a full list of possible TLDs and that way filtering out the ending of the hostname. – Gargron Feb 13 '13 at 18:28
  • Hey, I just noticed something. It works like a charm if I have http://facebook.com, but it returns www if I have **http://www.facebook.com** (www after http://) and nothing if I have www.facebook.com. Is it the parse_url function that's the problem? – Aleksander Feb 13 '13 at 18:40
  • @Alekplay Yes, parse_url() expects a valid URL. This is not an ideal solution but the most flexible one among the answers. – Gargron Feb 13 '13 at 18:42
  • And in the case of my `mydomain.co.uk` as fireeyedboy pointed out, it doesn't return anything – Aleksander Feb 13 '13 at 18:47
  • I think I have an answer to my problem and it's based on your response :) – Aleksander Feb 13 '13 at 19:10
0

Javascript:

document.domain.replace(".com","")

PHP:

$url = 'http://google.com/something/something';
$parse = parse_url($url);
echo str_replace(".com","", $parse['host']); //returns google
Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
jacob
  • 3,507
  • 2
  • 21
  • 26
0

You can use the parse_url function from PHP which returns exactly what you want - see

Philipp
  • 15,377
  • 4
  • 35
  • 52
0

Use the parse_url method in php to get domain.com and then use replace .com with empty string. I am a little rusty on my regular expressions but this should work.

$url='http://www.en.wikipedia.org';
$domain = parse_url($url, PHP_URL_HOST); //Will return en.wikipedia.org
$domain = preg_replace('\.com|\.org', '', $domain);

http://php.net/manual/en/function.parse-url.php

PHP REGEX: Get domain from URL

http://rubular.com/r/MvyPO9ijnQ //Check regular expressions

Community
  • 1
  • 1
0

You're looking for info on Regular Expression. It's a bit complicated, so be prepared to read up. In your case, you'll best utilize preg_match and preg_replace. It searches for a match based on your pattern and replaces the matches with your replacement.

preg_match preg_replace

I'd start with a pattern like this: find .com, .net or .org and delete it and everything after it. Then find the last . and delete it and everything in front of it. Finally, if // exists, delete it and everything in front of it.

if (preg_match("/^http:\/\//i",$url))
preg_replace("/^http:\/\//i","",$url);

if (preg_match("/www./i",$url))
preg_replace("/www./i","",$url);

if (preg_match("/.com/i",$url))
preg_replace("/.com/i","",$url);

if (preg_match("/\/*$/",$url))
preg_replace("/\/*$/","",$url);

^ = at the start of the string i = case insensitive \ = escape char $ = the end of the string

This will have to be played around with and tweaked, but it should get your pointed in the right direction.

Solid I
  • 580
  • 5
  • 13
  • 34
-2

This is quite a quick method but should do what you want in PHP:

function getDomain( $URL ) {
    return explode('.',$URL)[1];
}

I will update it when I get chance but basically it splits the URL into pieces by the full stop and then returns the second item which should be the domain. A bit more logic would be required for longer domains such as www.abc.xyz.com but for normal urls it would suffice.

gimg1
  • 1,121
  • 10
  • 24