Python Regex to Extract Domain from Text

Question

I have the following regex:

r'(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}'

When I apply this to a text string with, let's say, "this is www.website1.com and this is website2.com", I get:

['www.website1.com']

['website.com']

How can i modify the regex to exclude the 'www', so that I get 'website1.com' and 'website2.com? I'm missing something pretty basic ...

Possible duplicate of [Extract all domains from text](https://stackoverflow.com/questions/21211572/extract-all-domains-from-text) — tripleee, Nov 06 '18 at 07:32

user3483203 · Answer 1 · 2018-03-08T06:30:13.553

Try this one (thanks @SunDeep for the update):

\s(?:www.)?(\w+.com)

Explanation

\s matches any whitespace character

(?:www.)? non-capturing group, matches www. 0 or more times

(\w+.com) matches any word character one or more times, followed by .com

And in action:

import re

s = 'this is www.website1.com and this is website2.com'

matches = re.findall(r'\s(?:www.)?(\w+.com)', s)
print(matches)

Output:

['website1.com', 'website2.com']

A couple notes about this. First of all, matching all valid domain names is very difficult to do, so while I chose to use \w+ to capture for this example, I could have chosen something like: [a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}.

This answer has a lot of helpful info about matching domains: What is a regular expression which will match a valid domain name without a subdomain?

Next, I only look for .com domains, you could adjust my regular expression to something like:

\s(?:www.)?(\w+.(com|org|net))

To match whichever types of domains you were looking for.

Vikas Periyadath · Answer 2 · 2018-03-08T10:03:21.163

0

Here a try :

import re
s = "www.website1.com"
k = re.findall ( '(www.)?(.*?)$', s, re.DOTALL)[0][1]
print(k)

O/P like :

'website1.com'

if it is s = "website1.com" also it will o/p like :

'website1.com'

edited Mar 08 '18 at 10:03

answered Mar 08 '18 at 06:19

Vikas Periyadath

3,088
1
21
33

Python Regex to Extract Domain from Text

2 Answers2

Linked