I got a list of links and some of them look like
https://www.domainname
or https://domainname
I need to make a regex pattern to get only the domain name from it. This "www" make problems in my pattern :(
print(re.findall("//([a-zA-Z]+)", i))
I got a list of links and some of them look like
https://www.domainname
or https://domainname
I need to make a regex pattern to get only the domain name from it. This "www" make problems in my pattern :(
print(re.findall("//([a-zA-Z]+)", i))
You could use the end of the string.
url = "https://www.domainname"
url2 = "https://domainname"
for u in [url, url2]:
print(f'{u}')
print(re.findall(r"\w+$", url2))
https://www.domainname
['domainname']
https://domainname
['domainname']
import re
with open('testfile.txt', 'r') as file:
readfile = file.read()
search = re.finditer('(?:\w+:\/\/)?(?:\w+\.)(\w+)(\.\w+)', readfile)
for check in search:
print(check.group(1)) #type 1 : if you want only domain names
result :
domainname
example
My solution:
import re
l1 = ["https://www.domainname1", "https://domainname2"]
for i in l1:
print(re.findall("/(?:www\.)?(\w+)", i))
Output:
['domainname1']
['domainname2']