I have a list of links stored as a LIST. But I need to extract only the PDF links.
links = [ '<a class="tablebluelink" href="https://www.samplewebsite.com/xml-data/abcdef/higjkl/Thisisthe-required-document-b4df-16t9g8p93808.pdf" target="_blank"><img alt="Download PDF" border="0" src="../Include/images/pdf.png"/></a>', '<a class="tablebluelink" href="https://www.samplewebsite.com/xml-data/abcdef/higjkl/Thisisthe-required-document-link-4ea4-8f1c-dd36a1f55d6f.pdf" target="_blank"><img alt="Download PDF" border="0" src="../Include/images/pdf.png"/></a>']
So I need to extract only the link starting from 'https' and and ending with pdf as given below
https://www.samplewebsite.com/xml-data/abcdef/higjkl/Thisisthe-required-document-b4df-16t9g8p93808.pdf
And store this link in a list. There are many pdf links in the variable 'links'. Need to store all the pdf links in the variable named 'pdf_links'
Can anyone suggest me regular expression to extract this pdf link ? I have used the below regular expression but its not working.
pdf_regex = r""" (^<a\sclass="tablebluelink"\shref="(.)+.pdf"$)"""