I am trying to create a regex filter that will be used to sanitize domains that are processed by a python script.
The domains could possibly be just regular domain names
- something.com, some.something.com
or could have a url structure
or could have url structure with www
I currently have a crude regex to pull out domains out of these structures except I have not figured out a way to filter out the www. out.
(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-@]{,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,11}
This regex does a decent job grabbing domains out of urls, but when I try to do any kind of negative lookahead to remove the www.,I can't seem to get the desired result. I've tried (?!www.) which only took away one w not all 3 and the ., any help figuring this out would be most appreciated.