I have a following list of URLs:
urls = ["http://arxiv.org/pdf/1611.08097", "https://doi.org/10.1109/tkde.2016.2598561", "https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85116544648&origin=inward"]
from each element of the list, I am trying to extract just the domain names like: arxiv
, doi
, scopus
.
For that I have a code:
import re
for url in urls:
print(re.search('https?://([A-Za-z_0-9.-]+).*', url).group(1))
The output of print:
arxiv.org
doi.org
www.scopus.com
How can I modify the above regex to extract just the domain and no other stuff like www.
, .com
, .org
etc?
Thanks in advance.