I need a general script/pattern to extract the main domain name from URLs. I have the following attempt that failed.
Let use say I have this link1 and need to extract the main domain name (google.co.uk) without the sub-domain (mail). I made this script which worked fine with .co.uk
but will fail with websites that has one top-level domain name like: .com
and .com
.
Is there a better way to extract main domain name from ANY URL? The URL is constructed as follows:
https://(optional sub-domain)*(domain name with two or three top-level domain name)(optional forward slash followed by text)*
The *
refer to zero or more times.
var link1="https://mail.google.co.uk/link/link/link";
var url = new URL(link1);
var domain = url.hostname.split('.').slice(-3).join('.');
console.log("The domain name is: "+ domain);
In the above code, I expect: google.co.uk
It could work because the link has two parts in the top-level domain name (.co.uk
) so -3
works. But I need the code to work with this link as well:
var link1="https://mail.google.com/link/link/link";
And I need the output to be: google.com
But the problem is that the code produces:
mail.google.com
And I only want the main domain name: google.com
EDIT: Some of the expected output examples are here:
1) In mail.google.co.uk
it should be: google.co.uk
2) In mail.google.com
it should be: google.com
3) In link.mail.google.com/link/link
it should be: google.com
4) In link.link2.mail.google.com
it should be: google.com
i.e. just the main domain name without sub-domains or links after the domain name. The top-level domain name can be in the fom of (.com, .net, .org, etc.) or in the form of (.co.uk, .co.us, etc). The top-level domain name should be captured either if it is one part or two parts (my code capture only two parts).