1

I need a general script/pattern to extract the main domain name from URLs. I have the following attempt that failed.

Let use say I have this link1 and need to extract the main domain name (google.co.uk) without the sub-domain (mail). I made this script which worked fine with .co.uk but will fail with websites that has one top-level domain name like: .com and .com.

Is there a better way to extract main domain name from ANY URL? The URL is constructed as follows:

https://(optional sub-domain)*(domain name with two or three top-level domain name)(optional forward slash followed by text)*

The * refer to zero or more times.

var link1="https://mail.google.co.uk/link/link/link";
var url = new URL(link1);
var domain = url.hostname.split('.').slice(-3).join('.');
console.log("The domain name is: "+ domain);

In the above code, I expect: google.co.uk

It could work because the link has two parts in the top-level domain name (.co.uk) so -3 works. But I need the code to work with this link as well:

var link1="https://mail.google.com/link/link/link";

And I need the output to be: google.com

But the problem is that the code produces:

mail.google.com

And I only want the main domain name: google.com

EDIT: Some of the expected output examples are here:

1) In mail.google.co.uk it should be: google.co.uk

2) In mail.google.com it should be: google.com

3) In link.mail.google.com/link/link it should be: google.com

4) In link.link2.mail.google.com it should be: google.com

i.e. just the main domain name without sub-domains or links after the domain name. The top-level domain name can be in the fom of (.com, .net, .org, etc.) or in the form of (.co.uk, .co.us, etc). The top-level domain name should be captured either if it is one part or two parts (my code capture only two parts).

None
  • 281
  • 1
  • 6
  • 16
  • what is the expected output of `domain` from link1? – gurvinder372 Mar 29 '18 at 12:39
  • @Nikola Lukic that link is to extract the top-level domain name. I am asking about the main domain name in addition to the top-level domain name. e.g. `google.com`, `google.co.uk`. – None Mar 29 '18 at 12:51
  • Problem for parsing i see with '.' and double dot. You must make some validation object and define concrete roles. For example make this ".co.uk" like exception case . Program must know when is two or one dot valid result. – Nikola Lukic Mar 29 '18 at 13:00
  • @Nikola Lukic it is for any URL. I can not make exception. It is not only `.co.uk` but can be any thing. For example: `.co.us` or any other type. – None Mar 29 '18 at 13:02
  • Possible duplicate of [Issue while capturing Top-Level Domain from URL](https://stackoverflow.com/questions/40428687/issue-while-capturing-top-level-domain-from-url) – Pradeep Pati Mar 29 '18 at 14:00
  • Possible duplicate of [Get part of the url pathname via JavaScript regex](https://stackoverflow.com/questions/21481096/get-part-of-the-url-pathname-via-javascript-regex) – toesslab Mar 29 '18 at 14:01
  • If you don't have a rule you can't make working script. Question is not bad but extra handling is needed. – Nikola Lukic Mar 29 '18 at 15:08
  • I give you positive point to make this question a live . Suggestion make list of double case domain like 'co.uk' - co. is one of the case. – Nikola Lukic Mar 30 '18 at 08:32

1 Answers1

0

Sure if you wanted

"mail.google.co.uk"

you can just use

url.host

or if you wanted it with headers, use

url.origin

cheers!

JosephStevens
  • 1,668
  • 1
  • 15
  • 17