I'm using Hive as a database and I need a SQL method.
I want to replace an URL with the domain name. So for instance when I have http://info.myurl.com/some/path/into/the/server
, I want to have myurl.com
. But when I have https://myurl.com/
I also want to get myurl.com
.
I've found the regexp_extract
function in Hive, so I am this far:
select regexp_extract('http://info.myurl.com/some/path/into/the/server', 'http[s]?://.*?\.([^/]*)', 1) from dual;
Now the difficult part: how to know if there are 1 or 2 dots in the domain, so I know if I have to strip the first subdomain?