I wrote a Javascript routine that, given a hostname or a URL, it finds the root domain.
function getRootDomain(s){
var sResult = ''
try {
sResult = s.match(/^(?:.*\:\/?\/)?(?<domain>[\w\-\.]*)/).groups.domain
.match(/(?<root>[\w\-]*(\.\w{3,}|\.\w{2}|\.\w{2}\.\w{2}))$/).groups.root;
} catch(ignore) {}
return sResult;
}
What is the technique to combine the two regex rules into one rule?
I used this tutorial to try to advance my existing RegExp experience over the years, although I've never really understood lookbehinds and lookaheads (which might be useful here?), and then used the great tool at RegEx101.com for trial and error. What I tried was to stick what's after <root>
to replace what comes after <domain>
, and variations on that, and all failed.
A test set to use with a tool like RegEx101 could be:
https://test.com:8080/?id=4&re=3
https://test-test.com:8080/?id=4&re=3
https://data.test.com:8080/?id=4&re=3
https://data.test.com/?id=4&re=3
https://data.test.com/
https://data.test.com#testing
https://data.test.com/#testing
https://data.test.com:8080/#testing
https://data.test.com:8080#testing
https://data.tester.com/
https://data-test.test.com/
https://test.com
https://test.com#testing
https://test.com/
https://test.am/?id=4
https://test.com?id=3&re=3
https://test.com/?id=3&re=3
https://megatest.com/?id=3&re=3
test.com
data.test.co.uk
test.co
data.test.com
data.tester-test.com
data-test.tester-test.com
tester-test.com
about:blank