Try this one (thanks @SunDeep for the update):
\s(?:www.)?(\w+.com)
Explanation
\s
matches any whitespace character
(?:www.)?
non-capturing group, matches www.
0 or more times
(\w+.com)
matches any word character one or more times, followed by .com
And in action:
import re
s = 'this is www.website1.com and this is website2.com'
matches = re.findall(r'\s(?:www.)?(\w+.com)', s)
print(matches)
Output:
['website1.com', 'website2.com']
A couple notes about this. First of all, matching all valid domain names is very difficult to do, so while I chose to use \w+
to capture for this example, I could have chosen something like: [a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}
.
This answer has a lot of helpful info about matching domains:
What is a regular expression which will match a valid domain name without a subdomain?
Next, I only look for .com
domains, you could adjust my regular expression to something like:
\s(?:www.)?(\w+.(com|org|net))
To match whichever types of domains you were looking for.