I'm trying to filter out a bunch of urls to find their base url, which doesn't include the www or any prefix, having trouble writing a expression to capture it, but with subset of TLDs, it becomes a rather more complicated issue.
answers.yahoo.com => yahoo.com
www.google.com => google.com
uk.answers.yahoo.co.uk = > yahoo.co.uk
www.g.se => g.se
Any suggestions?
I was using this expression, but it messes up when the domain name isn't more than 2 characters or when the domain tld is less than 2 characters.
(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$