So I have the following text example:
Good Morning,
The link to your exam is https://uni.edu?hash=89234rw89yfw8fw89ef .Please complete it within the stipulated time.
If you have any issue, please contact us
https://www.uni.edu
https://facebook.com/uniedu
And what I want is to extract the url of the exam link: https://uni.edu?hash=89234rw89yfw8fw89ef . I'm planning to use the findAll() function but I'm having difficulties writing the regex to extract the specific url.
import re
def find_exam_url(text_file):
filename = open(text_file, "r")
new_file = filename.readlines()
word_lst = []
for line in new_file:
exam_url = re.findall('https?://', line) #use regex to extract exam url
return exam_url
if __name__ == "__main__":
print(find_exam_url("mytextfile.txt"))
The output i get is:
['http://']
Instead of:
https://uni.edu?hash=89234rw89yfw8fw89ef
Would appreciate some help on this.