0

I got a list of links and some of them look like

https://www.domainname
or https://domainname

I need to make a regex pattern to get only the domain name from it. This "www" make problems in my pattern :(

print(re.findall("//([a-zA-Z]+)", i))
jaco0646
  • 15,303
  • 7
  • 59
  • 83

3 Answers3

0

You could use the end of the string.

url = "https://www.domainname"
url2 = "https://domainname"


for u in [url, url2]:
    print(f'{u}')
    print(re.findall(r"\w+$", url2))

https://www.domainname
['domainname']
https://domainname
['domainname']
LetzerWille
  • 5,355
  • 4
  • 23
  • 26
0
import re

with open('testfile.txt', 'r') as file:
    readfile = file.read()

    search = re.finditer('(?:\w+:\/\/)?(?:\w+\.)(\w+)(\.\w+)', readfile)

    for check in search:
        print(check.group(1)) #type 1 : if you want only domain names 

result :

domainname
example
0

My solution:

import re

l1 = ["https://www.domainname1", "https://domainname2"]
for i in l1:
    print(re.findall("/(?:www\.)?(\w+)", i))

Output:

['domainname1']
['domainname2']
Shahab Rahnama
  • 982
  • 1
  • 7
  • 14