I have a list of a million or urls in an mysql table.
I need to cleanse the data (extract domains) so I can be confident about DISTINCT type queries.
Data is in several different types: -
www.domain.tld
domain.tld
http://domain.tld
https://vhost.domain.tld
domain.tld/
There are invalid domains and empty data.
Ideally I'd like to do something along the lines of : -
UPDATE table1 SET domain = website REGEXP '^(https?://)?[a-zA-Z0-9\\\\.\\\\-]+(/|$|\\\\?)'
domain being a new empty field, website being the original url.