0

Is there some algorithm for this? for example

 twitter.com
 zamg.ac.at

are top level domains and

pic.twitter.com

is secondary level domain

curiousity
  • 4,703
  • 8
  • 39
  • 59
  • I'm guessing you don't mean [TLD](http://en.wikipedia.org/wiki/Top-level_domain) as in DNS. Then in your case the algorithm would be to split on `.` and take the second-rightmost value? (not very sophisticated I know) – keyser Aug 14 '14 at 10:42

2 Answers2

1

Definition:

No, AFAIK it is like this:

Example: pic.twitter.com

Top level domain: com Second level domain: twitter Subdomain: pic (and every other potential part before pic)

In other terms the schema would be (subdomain.)*secondlevel.toplevel

Thus: zamg.ac.at would not be a top level domain, but rather a subdomain of ac.at with the tld being at.

Algorithm:

You could split on the dots and use the last part as the tld, the second-to-last part as the second level and the rest as subdomain(s).

However, if you want to define zamg.ac.at and twitter.com as being top level in the context of your application semantics (don't mix it up with the general understanding of toplevel) then you'd need some mapping because there is no apparent pattern.

Why you'd need a mapping:

Take .co.uk as an example: currently there is google.co.uk which in your semantics would be top level, but AFAIK it is now possible to register google.uk as well (and I'd say it's only a matter of time that this is done) so both domains would be on the same level (at least as I understand your question) but have a different number of parts.

As for .uk you couldn't use the pattern [^\.]+(\.\w{2})?\.\w{2}$ to find the "top level" part of a domain, since there might be "top level" domains using longer actual second level parts, e.g. .ltd.uk or .police.uk. And that's only for the .uk tld, there are a multitude of others as well.

Thomas
  • 87,414
  • 12
  • 119
  • 157
0

As mentioned in the previous answer, you need a mapping - the best source for this is publicsuffix.org which maintains a list of detailed rules for this purpose. There is a range of libraries in many programming languages (a comprehensive list is here) which can determine the TLD (more precisely: the registrable domain part) given a URL.

The following is an example of getting a registrable domain from a host using the [whois-server-list] https://github.com/whois-server-list/public-suffix-list! library. The host String variable is the host part of a valid URL (e.g. www.publicsuffix.org). PublicSuffixList and PublicSuffixListFactory are classes of the de.malkusch.whoisServerList.publicSuffixList package.

PublicSuffixListFactory factory = new PublicSuffixListFactory();
PublicSuffixList suffixListResolver = factory.build();
String registrableDomain = suffixListResolver.getRegistrableDomain(host); 
  • Whilst this may theoretically answer the question, [it would be preferable](//meta.stackoverflow.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – Draken Oct 03 '16 at 15:13
  • @Draken Good point, I have edited the response to comply with the respective guidelines. – Nikos Houssos Jun 20 '17 at 19:22