I am writing a program that needs to be able to identify different link structures in a dictionary. The links could look like this https://www.examplelink.com, www.examplelink.com, examplelink.com
Is there a way to identify these link types with a pattern and extract the entire URL from the text? This is my code so far it is able to get the third link example but none of the others. This is my code:
dictionary_itemnumber = 0
pattern1 = "(?P<url>https?://[^\s]+\.(com|net|ru|org|ir|in|uk|au|ua|de|ch))"
for i in range(total):
if(re.search(pattern1, parsed_text_dictionary["parsed text" + str(dictionary_itemnumber)])):
print("link found")
url = re.search("(?P<url>https?://[^\s]+\.(com|net|ru|org|ir|in|uk|au|ua|de|ch))", parsed_text_dictionary["parsed text" + str(dictionary_itemnumber)])
print(url)
else:
print("no link found")
dictionary_itemnumber = dictionary_itemnumber + 1
#The output of this code is
link found
<re.Match object; span=(132, 168), match='https://www.laufenburg-tourismus.com'>
no link found
no link found