Is there some algorithm for this? for example
twitter.com
zamg.ac.at
are top level domains and
pic.twitter.com
is secondary level domain
Is there some algorithm for this? for example
twitter.com
zamg.ac.at
are top level domains and
pic.twitter.com
is secondary level domain
Definition:
No, AFAIK it is like this:
Example: pic.twitter.com
Top level domain: com
Second level domain: twitter
Subdomain: pic
(and every other potential part before pic
)
In other terms the schema would be (subdomain.)*secondlevel.toplevel
Thus: zamg.ac.at
would not be a top level domain, but rather a subdomain of ac.at
with the tld being at
.
Algorithm:
You could split on the dots and use the last part as the tld, the second-to-last part as the second level and the rest as subdomain(s).
However, if you want to define zamg.ac.at
and twitter.com
as being top level in the context of your application semantics (don't mix it up with the general understanding of toplevel) then you'd need some mapping because there is no apparent pattern.
Why you'd need a mapping:
Take .co.uk
as an example: currently there is google.co.uk
which in your semantics would be top level, but AFAIK it is now possible to register google.uk
as well (and I'd say it's only a matter of time that this is done) so both domains would be on the same level (at least as I understand your question) but have a different number of parts.
As for .uk
you couldn't use the pattern [^\.]+(\.\w{2})?\.\w{2}$
to find the "top level" part of a domain, since there might be "top level" domains using longer actual second level parts, e.g. .ltd.uk
or .police.uk
. And that's only for the .uk
tld, there are a multitude of others as well.
As mentioned in the previous answer, you need a mapping - the best source for this is publicsuffix.org which maintains a list of detailed rules for this purpose. There is a range of libraries in many programming languages (a comprehensive list is here) which can determine the TLD (more precisely: the registrable domain part) given a URL.
The following is an example of getting a registrable domain from a host using the [whois-server-list] https://github.com/whois-server-list/public-suffix-list! library. The host
String variable is the host part of a valid URL (e.g. www.publicsuffix.org). PublicSuffixList
and PublicSuffixListFactory
are classes of the de.malkusch.whoisServerList.publicSuffixList
package.
PublicSuffixListFactory factory = new PublicSuffixListFactory();
PublicSuffixList suffixListResolver = factory.build();
String registrableDomain = suffixListResolver.getRegistrableDomain(host);