2

I know that there is parse_url and then you get get the ['host'], but that returns the full www.example.com. What I want is the following:

https://stackoverflow.com/questions/ask turns to stackoverflow

https://console.aws.amazon.com/s3/home?region=us-west-2 turns to amazon

https://www.google.com/ turns to google

Any suggestions on how to do that?

Community
  • 1
  • 1
David Mckee
  • 1,100
  • 3
  • 19
  • 35
  • 1
    I can't think of anything that would *reliably* do that. The examples you mentioned can be retrieved if you get the host name, explode it by `.`, and get the second last element. – Jorg Mar 12 '14 at 05:43
  • @Jorg, that won't work with country domains: `www.google.co.uk`. Second to last would be `.co` – Andy Mar 12 '14 at 05:49
  • @Andy That's what Jorg was saying, it doesn't work reliably. – Joachim Isaksson Mar 12 '14 at 05:50
  • Yeah, it did say it was only for those examples and it was unreliable. That's why it wasn't an answer but a comment :) – Jorg Mar 12 '14 at 05:50

2 Answers2

0

Still use parse_url(), but after that, use explode() and get the 2nd to the last index

// Explode by .
$arr_host = explode('.', $host);
// Count how many in array
$count = count($arr_host);
// Get second to the last index
$domain = $arr_host[$count-2];

echo $domain;
Ronald Borla
  • 586
  • 4
  • 19
0

Try

$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
   echo strstr($regs['domain'], '.', true);
}
Javad
  • 4,339
  • 3
  • 21
  • 36
  • with top level domains available for purchase, this is going to fail on a whole host of them: https://gtldresult.icann.org/application-result/applicationstatus/viewstatus – Jorg Mar 12 '14 at 05:57
  • @Jorg In this URL example which you provided, isn't *gtldresult* a subdomain? – Javad Mar 12 '14 at 06:00
  • yeah, but the link I posted was actually for you to follow :) It has all the current in-progress requests for top level domains at ICANN, most of which are Chinese, or have accented characters in them that are not captured in `[a-z0-9]` – Jorg Mar 12 '14 at 06:01
  • Yup, you're right. In that case I think the regExp should match with UTF-8 characters and \p or \X can can be used [http://www.php.net/manual/en/regexp.reference.unicode.php](regexpUnicode) – Javad Mar 12 '14 at 06:15