I have a list of domain names like this:
usatoday.com
detroitnews.com
virust.com
ajkdfabbbbbbb.net
ha.box.sk
www.test.net
rp.fff.com
I am trying to write a regex to be able to match all of the said domains.
For the domains, here is my regex but it doesn't work that well:
import re
s='dd.ddd.com rp.ff.com usatoday.net'
d= re.compile(r'(?<!\S)(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,6}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})(?!\S)')
result = d.findall(s)
print(result)
Output:
[('dd', '', 'dd', '', '', '', 'ddd.com'), ('rp', '', 'rp', '', '', '', 'ff.com'), ('usatoday', '', '', '', '', 'usatoday', 'net')]
I need the output to be:
['dd.ddd.com', 'rp.ff.com', 'usatoday.net']
I am new to regex so any changes in the regexes above would help.
This is an updated version on my scirp