42

How to get the domain name without subdomains?

e.g. if the url is "http://one.two.roothost.co.uk/page.html" how to get "roothost.co.uk"?

Maximillian Laumeister
  • 19,884
  • 8
  • 59
  • 78
pmcilreavy
  • 3,076
  • 2
  • 28
  • 37
  • That’s not what you would call the [canonical name](http://en.wikipedia.org/wiki/CNAME_record). – Gumbo Mar 17 '12 at 19:26
  • @Gumbo http://wiki.apache.org/httpd/CanonicalHostNames – Dagg Nabbit Mar 17 '12 at 19:28
  • Just to clarify, by "canonical hostname" you mean that "one.two.roothost.co.uk" redirects to "roothost.co.uk," as described in the link above, correct? – Dagg Nabbit Mar 17 '12 at 19:36
  • now you're going to have to define "root host name" for us – dldnh Mar 17 '12 at 19:41
  • 1
    by root host name i mean the registered domain name. the root host with all the sub domains removed as i described in the question. i.e. one.two.roothost.co.uk => roothost.co.uk – pmcilreavy Mar 17 '12 at 19:44
  • So if I host my site at bobsfreehost.com, and Bob gives me the subdomain "mystuff," do you want mystuff.bobsfreehost.com, or bobsfreehost.com? See where I'm going with this? – Dagg Nabbit Mar 17 '12 at 19:54
  • I think I've now made the question as clear as i can make it. Don't concern yourself with where Bob is hosting his website. No matter where it's hosted. I need a sure way of finding the A Record host name of a given url. – pmcilreavy Aug 21 '15 at 02:15
  • 1
    Have a look at this question and answer. http://stackoverflow.com/questions/736513/how-do-i-parse-a-url-into-hostname-and-path-in-javascript – SergeyAn Aug 28 '15 at 03:42
  • I added the bounty as I'd like to know the host without the subdomains, the question you linked to doesn't help with that. – Alexis Tyler Aug 28 '15 at 05:16
  • @XO - I posted an answer below with working JSFiddle link. Can you please confirm if it helps you ? – Dinesh Chitlangia Aug 28 '15 at 05:30
  • 1
    So far the only answer I can see that's "correct" would be the one @MaximillianLaumeister posted but I can't accept the bounty for another 21 hours so I'll wait till then and see if anyone else comes up with anything. Personally I'm looking for a way to get ANY domain without the subdomain, your answer looks like it only works for some types of domains and doesn't use a proven library or method to get the result unlike his. – Alexis Tyler Aug 28 '15 at 05:33

11 Answers11

40

Following is a solution to extract a domain name without any subdomains. This solution doesn't make any assumptions about the URL format, so it should work for any URL. Since some domain names have one suffix (.com), and some have two or more (.co.uk), to get an accurate result in all cases, we need to parse the hostname using the Public Suffix List, which contains a list of all public domain name suffixes.


Solution

First, include the public suffix list js api in a script tag in your HTML, then in JavaScript to get the hostname you can call:

var parsed = psl.parse('one.two.roothost.co.uk');
console.log(parsed.domain);

...which will return "roothost.co.uk". To get the name from the current page, you can use location.hostname instead of a static string:

var parsed = psl.parse(location.hostname);
console.log(parsed.domain);

Finally, if you need to parse a domain name directly out of a full URL string, you can use the following:

var url = "http://one.two.roothost.co.uk/page.html";
url = url.split("/")[2]; // Get the hostname
var parsed = psl.parse(url); // Parse the domain
document.getElementById("output").textContent = parsed.domain;

JSFiddle Example (it includes the entire minified library in the jsFiddle, so scroll down!): https://jsfiddle.net/6aqdbL71/2/

Maximillian Laumeister
  • 19,884
  • 8
  • 59
  • 78
  • wouldn't it be even more powerfull using [the a.href tip](http://stackoverflow.com/questions/736513/how-do-i-parse-a-url-into-hostname-and-path-in-javascript) as proposed by user1551066 ? – Kaiido Aug 28 '15 at 06:43
  • 1
    It's worth noting that although the TLDs were fairly static for a long time, [they're currently changing](http://newgtlds.icann.org/en/program-status/delegated-strings). This will muddle your problem somewhat. –  Aug 28 '15 at 19:55
  • @jedd.ahyoung The psl script is updated fairly regularly AFAIK. As a solution to the changing TLDs, you could have the script auto-update itself server-side from GitHub on a schedule. So for instance, every week your server runs a job that downloads the new psl script so that it's fresh for the client-side. For bonus points you could even set up a git hook instead of a scheduled job. – Maximillian Laumeister Aug 28 '15 at 20:00
  • 1
    Well - if it does indeed update regularly - it would be simpler to simply pull the script from a CDN, instead of resorting to auto-updates and other such things. The main problem here for me is the external dependency, but I suppose if it works, it works. –  Aug 28 '15 at 20:02
  • @jedd.ahyoung Unfortunately I looked for a CDN for the psl script and couldn't find any. If there were a reliable CDN for the script available, it would definitely be the best solution. – Maximillian Laumeister Aug 28 '15 at 20:04
  • 12
    This solution is insane to me. 122KB of minified code added to your library and if you dig in to the code you get very static definitions: "hgtv","hiphop","hisamitsu","hitachi","hiv","hkt","hockey","holdings","holiday","homedepot","homegoods","homes","homesense","honda".... I'm glad this solution is available but really wish subdomain fishing was built in to the browser. – Scott L Feb 26 '19 at 16:11
  • Following solution of @MaximillianLaumeister, I reached [here](https://publicsuffix.org/learn/), where I could find [publicsuffixlist.js](https://github.com/gorhill/publicsuffixlist.js) and [tld.js](https://github.com/oncletom/tld.js) recommended for JavaScript, and [tldts](https://github.com/remusao/tldts) for Typescript. – Anand Shankar Jan 01 '22 at 04:50
0

This works for me:

const firstTLDs = "ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|be|bf|bg|bh|bi|bj|bm|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|cl|cm|cn|co|cr|cu|cv|cw|cx|cz|de|dj|dk|dm|do|dz|ec|ee|eg|es|et|eu|fi|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jo|jp|kg|ki|km|kn|kp|kr|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|na|nc|ne|nf|ng|nl|no|nr|nu|nz|om|pa|pe|pf|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|yt".split('|');

const secondTLDs = "com|edu|gov|net|mil|org|nom|sch|caa|res|off|gob|int|tur|ip6|uri|urn|asn|act|nsw|qld|tas|vic|pro|biz|adm|adv|agr|arq|art|ato|bio|bmd|cim|cng|cnt|ecn|eco|emp|eng|esp|etc|eti|far|fnd|fot|fst|g12|ggf|imb|ind|inf|jor|jus|leg|lel|mat|med|mus|not|ntr|odo|ppg|psc|psi|qsl|rec|slg|srv|teo|tmp|trd|vet|zlg|web|ltd|sld|pol|fin|k12|lib|pri|aip|fie|eun|sci|prd|cci|pvt|mod|idv|rel|sex|gen|nic|abr|bas|cal|cam|emr|fvg|laz|lig|lom|mar|mol|pmn|pug|sar|sic|taa|tos|umb|vao|vda|ven|mie|北海道|和歌山|神奈川|鹿児島|ass|rep|tra|per|ngo|soc|grp|plc|its|air|and|bus|can|ddr|jfk|mad|nrw|nyc|ski|spy|tcm|ulm|usa|war|fhs|vgs|dep|eid|fet|fla|flå|gol|hof|hol|sel|vik|cri|iwi|ing|abo|fam|gok|gon|gop|gos|aid|atm|gsm|sos|elk|waw|est|aca|bar|cpa|jur|law|sec|plo|www|bir|cbg|jar|khv|msk|nov|nsk|ptz|rnd|spb|stv|tom|tsk|udm|vrn|cmw|kms|nkz|snz|pub|fhv|red|ens|nat|rns|rnu|bbs|tel|bel|kep|nhs|dni|fed|isa|nsn|gub|e12|tec|орг|обр|упр|alt|nis|jpn|mex|ath|iki|nid|gda|inc".split('|');

const knownSubdomains = "www|studio|mail|remote|blog|webmail|server|ns1|ns2|smtp|secure|vpn|m|shop|ftp|mail2|test|portal|ns|ww1|host|support|dev|web|bbs|ww42|squatter|mx|email|1|mail1|2|forum|owa|www2|gw|admin|store|mx1|cdn|api|exchange|app|gov|2tty|vps|govyty|hgfgdf|news|1rer|lkjkui";

function removeSubdomain(s) {
    const knownSubdomainsRegExp = new RegExp(`^(${knownSubdomains})\.`, 'i');
    s = s.replace(knownSubdomainsRegExp, '');

    const parts = s.split('.');

    while (parts.length > 3) {
        parts.shift();
    }

    if (parts.length === 3 && ((parts[1].length > 2 && parts[2].length > 2) || (secondTLDs.indexOf(parts[1]) === -1) && firstTLDs.indexOf(parts[2]) === -1)) {
        parts.shift();
    }

    return parts.join('.');
};

var tests = {
  'www.sidanmor.com':             'sidanmor.com',
  'exemple.com':                  'exemple.com',
  'argos.co.uk':                  'argos.co.uk',
  'www.civilwar.museum':          'civilwar.museum',
  'www.sub.civilwar.museum':      'civilwar.museum',
  'www.xxx.sub.civilwar.museum':  'civilwar.museum',
  'www.exemple.com':              'exemple.com',
  'main.testsite.com':            'testsite.com',
  'www.ex-emple.com.ar':          'ex-emple.com.ar',
  'main.test-site.co.uk':         'test-site.co.uk',
  'en.tour.mysite.nl':            'tour.mysite.nl',
  'www.one.lv':                   'one.lv',
  'www.onfdsadfsafde.lv':         'onfdsadfsafde.lv',
  'aaa.onfdsadfsafde.aa':         'onfdsadfsafde.aa',
};

const firstTLDs = "ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|be|bf|bg|bh|bi|bj|bm|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|cl|cm|cn|co|cr|cu|cv|cw|cx|cz|de|dj|dk|dm|do|dz|ec|ee|eg|es|et|eu|fi|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jo|jp|kg|ki|km|kn|kp|kr|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|na|nc|ne|nf|ng|nl|no|nr|nu|nz|om|pa|pe|pf|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|yt".split('|');

const secondTLDs = "com|edu|gov|net|mil|org|nom|sch|caa|res|off|gob|int|tur|ip6|uri|urn|asn|act|nsw|qld|tas|vic|pro|biz|adm|adv|agr|arq|art|ato|bio|bmd|cim|cng|cnt|ecn|eco|emp|eng|esp|etc|eti|far|fnd|fot|fst|g12|ggf|imb|ind|inf|jor|jus|leg|lel|mat|med|mus|not|ntr|odo|ppg|psc|psi|qsl|rec|slg|srv|teo|tmp|trd|vet|zlg|web|ltd|sld|pol|fin|k12|lib|pri|aip|fie|eun|sci|prd|cci|pvt|mod|idv|rel|sex|gen|nic|abr|bas|cal|cam|emr|fvg|laz|lig|lom|mar|mol|pmn|pug|sar|sic|taa|tos|umb|vao|vda|ven|mie|北海道|和歌山|神奈川|鹿児島|ass|rep|tra|per|ngo|soc|grp|plc|its|air|and|bus|can|ddr|jfk|mad|nrw|nyc|ski|spy|tcm|ulm|usa|war|fhs|vgs|dep|eid|fet|fla|flå|gol|hof|hol|sel|vik|cri|iwi|ing|abo|fam|gok|gon|gop|gos|aid|atm|gsm|sos|elk|waw|est|aca|bar|cpa|jur|law|sec|plo|www|bir|cbg|jar|khv|msk|nov|nsk|ptz|rnd|spb|stv|tom|tsk|udm|vrn|cmw|kms|nkz|snz|pub|fhv|red|ens|nat|rns|rnu|bbs|tel|bel|kep|nhs|dni|fed|isa|nsn|gub|e12|tec|орг|обр|упр|alt|nis|jpn|mex|ath|iki|nid|gda|inc".split('|');

const knownSubdomains = "www|studio|mail|remote|blog|webmail|server|ns1|ns2|smtp|secure|vpn|m|shop|ftp|mail2|test|portal|ns|ww1|host|support|dev|web|bbs|ww42|squatter|mx|email|1|mail1|2|forum|owa|www2|gw|admin|store|mx1|cdn|api|exchange|app|gov|2tty|vps|govyty|hgfgdf|news|1rer|lkjkui";

function removeSubdomain(s) {
const knownSubdomainsRegExp = new RegExp(`^(${knownSubdomains})\.`, 'i');
s = s.replace(knownSubdomainsRegExp, '');

const parts = s.split('.');

while (parts.length > 3) {
    parts.shift();
}

if (parts.length === 3 && ((parts[1].length > 2 && parts[2].length > 2) || (secondTLDs.indexOf(parts[1]) === -1) && firstTLDs.indexOf(parts[2]) === -1)) {
    parts.shift();
}

return parts.join('.');
};

for (var test in tests) {
  if (tests.hasOwnProperty(test)) {
    var t = test;
    var e = tests[test];
    var r = removeSubdomain(test);
    var s = e === r;
    if (s) {
      console.log('OK: "' + t + '" should be "' + e + '" and it is really "' + r + '"');
    } else {
      console.log('Fail: "' + t + '" should be "' + e + '" but it is NOT "' + r + '"');
    }
  }
}

Referance:

psl.min.js file

Maximillian Laumeister Answer to this question

The most popular subdomains on the internet

sidanmor
  • 5,079
  • 3
  • 23
  • 31
  • I'm currently trying to figure out a way to optimize your solution, since it doesn't pass the test with `www.d7143.test.me`. It spits out `d7143.test.me` instead of `test.me`. – John the User Aug 07 '17 at 09:21
  • "me" is in my firstTLDs list. If you want to get "test.me" you can remove it from the list... But I don't think it is right to do it, bercouse there are sites with "me" as firstTLD... Good luck! – sidanmor Aug 07 '17 at 12:58
  • yeah this is useless, www.d7143.test.me should return test.me without changes – Ivan Castellanos Nov 29 '18 at 16:13
  • 3
    You list is not up to date. As of today there are 1584 top level domains. https://www.iana.org/domains/root/db – Boris Verkhovskiy Dec 03 '19 at 20:50
  • this script doesn't work for many cases, i.e. `removeSubdomain("main.test-site.fr")` will return `main.test-site.fr` – Eugen Jun 17 '20 at 20:55
  • 2
    It is not a good idea to hard-code TLDs because new TLDs are introduced much more frequently then you think, and the rules might change too. – Flimm Aug 19 '20 at 19:18
0

You can use parse-domain to do the heavy lifting for you. This package considers the public suffix list and returns an easy to work with object breaking up the domain.

Here is an example from their readme:

  npm install parse-domain
  import { parseDomain, ParseResultType } from 'parse-domain';

  const parseResult = parseDomain(
    // should be a string with basic latin characters only. more details in the readme
    'www.some.example.co.uk',
  );

  // check if the domain is listed in the public suffix list
  if (parseResult.type === ParseResultType.Listed) {
    const { subDomains, domain, topLevelDomains } = parseResult;

    console.log(subDomains); // ["www", "some"]
    console.log(domain); // "example"
    console.log(topLevelDomains); // ["co", "uk"]
  } else {
    // more about other parseResult types in the readme
  }
Ulad Kasach
  • 11,558
  • 11
  • 61
  • 87
0

What about this?

function getCanonicalHost(hostname) {
  const MAX_TLD_LENGTH = 3;
  
  function isNotTLD(_) { return _.length > MAX_TLD_LENGTH; };

  hostname = hostname.split('.');
  hostname = hostname.slice(Math.max(0, hostname.findLastIndex(isNotTLD)));
  hostname = hostname.join('.');

  return hostname;
}

console.log(getCanonicalHost('mail.google.com'));
console.log(getCanonicalHost('some.google.com.ar'));
console.log(getCanonicalHost('some.another.google.com.ar'));
console.log(getCanonicalHost('foo.bar.google.com'));
console.log(getCanonicalHost('foo.bar.google.com.ar'));
console.log(getCanonicalHost('bar.google.ar'));

Its works since https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_domain_name say:

TLDs can contain special as well as latin characters. A TLD's maximum length is 63 characters, although most are around 2–3.

In https://data.iana.org/TLD/tlds-alpha-by-domain.txt are 1481 TLD, 466 of this has length around 2–3 and the most used TLD no has more than 3.

If you need a solution that works with all TLDS, here is a more complex aproach:

function getCanonicalHost(hostname) {
  return getCanonicalHost.tlds.then(function(tlds) {   
    function isNotTLD(_) { return tlds.indexOf(_) === -1; };

    hostname = hostname.toLowerCase();
    hostname = hostname.split('.');
    hostname = hostname.slice(Math.max(0, hostname.findLastIndex(isNotTLD)));
    hostname = hostname.join('.');

    return hostname; 
  });
}

getCanonicalHost.tlds = new Promise(function(res, rej) {
  const TLD_LIST_URL= 'https://data.iana.org/TLD/tlds-alpha-by-domain.txt';

  const xhr = new XMLHttpRequest();

  xhr.addEventListener('error', rej);
  xhr.addEventListener('load', function() { 
    const MAX_TLD_LENGTH = 63;

    var tlds = xhr.responseText.split('\n');
    tlds = tlds.map(function(_) { return _.trim().toLowerCase(); });
    tlds = tlds.filter(Boolean);
    tlds = tlds.filter(function(_) { return _.length < MAX_TLD_LENGTH; });

    res(tlds);
  });

  xhr.open('GET', TLD_LIST_URL);
  xhr.send();
})

getCanonicalHost('mail.google.com').then(console.log);
getCanonicalHost('some.google.com.ar').then(console.log);
getCanonicalHost('some.another.google.com.ar').then(console.log);
getCanonicalHost('foo.bar.google.com').then(console.log);
getCanonicalHost('foo.bar.google.com.ar').then(console.log);
getCanonicalHost('bar.google.ar').then(console.log);
NSD
  • 428
  • 4
  • 4
0

Just use the code snippet below.

/(?<=\.).+/.exec(location.hostname)[0]
-1

Simplest solution:

var domain='https://'+window.location.hostname.split('.')[window.location.hostname.split('.').length-2]+'.'+window.location.hostname.split('.')[window.location.hostname.split('.').length-1];
alert(domain);
-1

I created this function which uses URL to parse. It cheats by assuming all hostnames will have either 4 or less parts.

const getDomainWithoutSubdomain = url => {
  const urlParts = new URL(url).hostname.split('.')

  return urlParts
    .slice(0)
    .slice(-(urlParts.length === 4 ? 3 : 2))
    .join('.')
}

[
  'https://www.google.com',
  'https://www.google.co.uk',
  'https://mail.google.com',
  'https://www.bbc.co.uk/news',
  'https://github.com',
].forEach(url => {
  console.log(getDomainWithoutSubdomain(url))
})
devmatic
  • 37
  • 3
-3

Here is a working JSFiddle

My solution works with the assumption that the root hostname you are looking for is of the type "abc.xyz.pp".

extractDomain() returns the hostname with all the subdomains. getRootHostName() splits the hostname by . and then based on the assumption mentioned above, it uses the shift() to remove each subdomain name. Finally, whatever remains in parts[], it joins them by . to form the root hostname.

Javascript

var urlInput = "http://one.two.roothost.co.uk/page.html";

function extractDomain(url) {
    var domain;
    //find & remove protocol (http, ftp, etc.) and get domain
    if (url.indexOf("://") > -1) {
        domain = url.split('/')[2];
    } else {
        domain = url.split('/')[0];
    }

    //find & remove port number
    domain = domain.split(':')[0];

    return domain;
}

function getRootHostName(url) {
    var parts = extractDomain(url).split('.');
    var partsLength = parts.length - 3;

    //parts.length-3 assuming root hostname is of type abc.xyz.pp
    for (i = 0; i < partsLength; i++) {
        parts.shift(); //remove sub-domains one by one
    }
    var rootDomain = parts.join('.');

    return rootDomain;
}

document.getElementById("result").innerHTML = getRootHostName(urlInput);

HTML

<div id="result"></div>

EDIT 1: Updated the JSFiddle link. It was reflecting the incorrect code.

-3

What about...

    function getDomain(){
        if(document.domain.length){
            var parts = document.domain.replace(/^(www\.)/,"").split('.');

            //is there a subdomain? 
            while(parts.length > 2){
                //removing it from our array 
                var subdomain = parts.shift();
            }

            //getting the remaining 2 elements
            var domain = parts.join('.');

            return domain.replace(/(^\.*)|(\.*$)/g, "");
        }
        return '';
    }
Alvaro
  • 40,778
  • 30
  • 164
  • 336
  • 2
    It all depends what you consider domain or subdomain. In that case this scrip considers `google` the subdomain and `co.in` the domain :) Although, taking into account that domains should be able to exist without subdomains, `co.uk` is not a domain. So, in that case, yeap, this script will fail for those cases. – Alvaro Mar 27 '18 at 11:28
-4

My solution worked for me: Get "gocustom.com" from "shop.gocustom.com"

var site_domain_name = 'shop.gocustom.com';
alert(site_domain_name);
var strsArray = site_domain_name.split('.');
var strsArrayLen = strsArray.length;
alert(strsArray[eval(strsArrayLen - 2)]+'.'+strsArray[eval(strsArrayLen - 1)])
-17

You can try this in JavaScript:

alert(window.location.hostname);

It will return the hostname.

Angel Politis
  • 10,955
  • 14
  • 48
  • 66
Nikhil.nj
  • 242
  • 1
  • 11