0

I am looking for a way to extract what I called "hostname root" from a given hostname i.e.

f('stackoverflow.com') -> 'stackoverflow.com'
f('www.stackoverflow.com') -> 'stackoverflow.com'
f('www.stackoverflow.co.uk') -> 'stackoverflow.co.uk'

My first approach was (of course) RegExp but SLD is an issue because there are a considerable amount of options.

Maybe a SLDs database would be a good approach.

EDIT

I am working with node.js and by now I am using the tldjs module

PauloASilva
  • 1,000
  • 1
  • 7
  • 19
  • Specific to PHP but related: http://stackoverflow.com/q/2527231/187606 – Pekka Dec 04 '14 at 20:50
  • you don't say what platform or language you are using but there are libraries to do this for most of them – Vorsprung Dec 04 '14 at 20:51
  • @Pekka웃 My question is different from the one you pointed (but thanks): I'm not working with URLs nor `parse_url` PHP function is able (or aims to) to provide the "hostname root". – PauloASilva Dec 04 '14 at 21:05
  • See my answer, the bit after `As to extracting the "right" domain in uncertain cases` - it doesn't provide much more information than David's answer though. There simply is no way without having a list of TLDs. – Pekka Dec 04 '14 at 21:06

1 Answers1

1

You need to have the entire SLD/TLD database to do this. There's no other general purpose way, especially because there's in some edge cases third or fourth level domains.

David Pfeffer
  • 38,869
  • 30
  • 127
  • 202