Issue while capturing Top-Level Domain from URL

Question

I want a way to capture the Top-Level Domain from a URL, but am not able to get any success. The problem in my case is that the URL can be different. Sometimes a user can enter www.google.com or m.google.com or m.google.uk or google.uk or www.m.google.com

I tried using slice but it didn't work as I can have 2 or 3 characters in my URL. I can't split based on ".", I might get 2 or 3 or 4 results. Is there a single-line JavaScript function I can use? Is there any easy custom function available?

All posts are pointing to get the host name but in my case I want to extract just last 3 or 2 characters of the URL (com, uk, cn, etc.). I can apply multiple if-else loops too but I want to avoid that, and want to check if there is a simple solution for this.

I am looking for output as 'com' or 'uk' or 'cn' depending on top level domain of my URL. URL is entered by user which is why it difficult to predict if user will enter m.google.com or www.m.google.com or www.google.com or simply google.com

@Paul Roub This is not a duplicate. I want to extract domain name (com, cn, uk) here and not host name like how it is in answer pointed by you. — NewWorld, Nov 04 '16 at 17:43
Then please edit your question to clarify that you're looking for Top Level Domain (TLD), *not* the domain name (which would be google.com, etc.) Your question shows no expected output (which would have let us know you were using the wrong term). As it is, it will be closed again as Too Broad or Off Topic due to No [mcve]. — Paul Roub, Nov 04 '16 at 17:46
Where are you getting the URL? If it is the current `window.location` there is an easy solution. If you have some arbitrary string that came from who-knows-where it's a little more difficult problem. In what form is the URL? Is it a complete, always starts with `http`|`s` or is it some fragment? — Stephen P, Nov 04 '16 at 18:01
@StephenP I edited my question. I hope it will help you to understand. It is entered by user so it can or can not start with `http|s` — NewWorld, Nov 04 '16 at 18:06
I commented on Timo's answer. Once you've got the hostname, there are many ways that you can get the last part of it -- the top-level domain. — Stephen P, Nov 04 '16 at 18:11

TimoStaudinger · Answer 1 · 2016-11-04T18:36:56.963

2

One possible approach:

var parser = document.createElement('a');

parser.href = "http://www.google.com/path/";
console.log(parser.hostname); // "www.google.com"

parser.href = "http://m.google.com/path/";
console.log(parser.hostname); // "m.google.com"

parser.href = "http://www.m.google.com/path/";
console.log(parser.hostname); // "www.m.google.com"

edited Nov 04 '16 at 18:36

answered Nov 04 '16 at 17:37

TimoStaudinger

41,396
16
88
94

I want to capture domain name (com, uk, cn) and not hostname. – NewWorld Nov 04 '16 at 17:39
`var host = parser.hostname;` `console.log( host.slice( host.lastIndexOf('.') ) );` – Stephen P Nov 04 '16 at 18:06
`var hostParts = parser.hostname.split('.');` `console.log( hostParts[hostParts.length - 1] );` – Stephen P Nov 04 '16 at 18:09
@NewWorld you have to declare and initialize "parser" `var parser = document.createElement('a')` as Timo has. You can name it something other than parser if that makes more sense to you. – Stephen P Nov 04 '16 at 18:35
@StephenP I realized and did that but it throws me an error stating document is not defined. May be document is not defined in protractor. I use below code and it worked fine so i think we are good. I am yet to check if this will not work in some cases. `var parser = TextBox.siteName;//get input of site from user in parser variable. var hostParts = parser.split('.'); var URLdomain = hostParts[hostParts.length - 1];` – NewWorld Nov 04 '16 at 18:44

score 1 · Accepted Answer · answered Nov 04 '16 at 18:46

1

Below code works for me. Thanks @StephenP for your help. Thanks @Timo as well but it seems Document is not identified in protractor library.

var parser = TextBox.siteName;//get input of site from user in parser variable.
 var hostParts = parser.split('.');
    var URLdomain = hostParts[hostParts.length - 1];

answered Nov 04 '16 at 18:46

NewWorld

764
1
10
31

1

ah ha... I think we all assumed this was running in a browser context. `document` (lowercase `d`) aka `window.document` will exist if your javascript is running in the browser. – Stephen P Nov 04 '16 at 18:52

score 0 · Answer 3 · answered Nov 04 '16 at 19:27

If you can isolate the domain, the last period (.) should signify the TLD.

Test it out here: https://jsfiddle.net/ubb61wam/2/

var addresses = [
  'google.com',             // should return 'com'
  'https://google.com.uk',  // should return 'uk'
  'yahoo.cn/foo/bar.foo',   // should return 'cn'
  'file:///usr/local'       // should fail
];

for (var index in addresses) {
    console.log(tld(addresses[index]));
}

function tld(address) {
    // handle edge-cases
    if (typeof address == 'undefined' || address.indexOf('file:///') != -1)
        return undefined;

    var part = address;

    //remove http://
    if (part.indexOf('//') != -1)
        part = part.split('//')[1];

    //isolate domain
    if (part.indexOf('/') != -1)
        part = part.split('/')[0];  

    //get tld
    if (part.indexOf('.') != -1) {
        var all = part.split('.');
        part = all[all.length - 1]; 
    }
    return part;
}

As i mentioned in my question, i was able to use multiple if else but i wanted to avoid that. Is there anything wrong with solution i posted or Stephen P recommended in 2 lines? — NewWorld, Nov 04 '16 at 19:40
@NewWorld your solution is probably ideal. The `protractor` tag was the only indication of the library you are using prior to your self-answer (I didn't notice it, Stepen P makes hint of this), hence the varying solutions you're seeing. You should consider amending the issue description to reflect the technology that this solutions requires, then mark your own as the proper answer tomorrow when the site allows. — tresf, Nov 04 '16 at 19:58

Issue while capturing Top-Level Domain from URL

3 Answers3

Linked