1

I want to extract the top level domain from a URL: The logs are like this:

<182>Jul 28 13:52:34 PROXYSQUID1 logger: 1501249953.155      0 192.168.4.27 TCP_MISS/503 2408 POST http://xxxxx.ddns.net:xxx/xxxxx - DIRECT/xxx.xx.x.xx text/html

 

I want to get only the top level domain:

ddns

I tried this regex

([\da-z\.-]+)\.([a-z\.])

But I got

xxxxx.ddns
double-beep
  • 5,031
  • 17
  • 33
  • 41
Zakaria Mamai
  • 25
  • 1
  • 5

1 Answers1

3

You kind of mistook the words here... A TLD (Top Level Domain) refers to the last segment of a domain name or the part that follows immediately after the "dot" symbol. (E.g.: .com, .net, etc..)

What you're searching for is the second level domain (or SLD).

I've edited Daveo's answer for your question, so the match will be returned to the first capture group:

(?:[-a-zA-Z0-9@:%_\+~.#=]{2,256}\.)?([-a-zA-Z0-9@:%_\+~#=]*)\.[a-z]{2,6}\b(?:[-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)

Here is a demo: https://regex101.com/r/x2luiO/1

Explanation:

  • (?:[-a-zA-Z0-9@:%_\+~.#=]{2,256}\.)? - This first part will get everything before your SLD (subdomains).
  • ([-a-zA-Z0-9@:%_\+~#=]*) - This is your capturing group (Where the domain should be returned)
  • \.[a-z]{2,6} - This will match the TLD (if you also want to capture)
  • \b(?:[-a-zA-Z0-9@:%_\+.~#?&\/\/=]*) - And this is the rest of the regex, that should match the port and/or the rest of the URL (/example/page/).

It's also good to point that this regex will not match if you're testing a domain with the SLD and ccTLD (Country Code TLD) 'combo', example: .co.uk and .co.it, both are just the end of a domain for commercial and general websites, however, both will return co as the SLD.

Mateus
  • 4,863
  • 4
  • 24
  • 32