I am currently trying to find out the way how to efficiently extrant substrings from my file in Python. I have a file with extracted html code
<td><a href="/archiv/zivotopisy/2022/6/Zivotopis-OJVLA-20220624132548.pdf" target="_blank">Jitka Horáková</a></td>
<td><a href="/archiv/zivotopisy/2022/6/Zivotopis-XUBIC.pdf" target="_blank">Bohumil Tobolka</a></td>
<td><a href="/archiv/zivotopisy/2022/5/Zivotopis-UNBLA.pdf" target="_blank">Stanislava Rousová, Ing.</a></td>
<td><a href="/archiv/zivotopisy/2022/4/Zivotopis-NYBCF-20220407134152.pdf" target="_blank">Ladislav Macháč</a></td>
<td><a href="/archiv/zivotopisy/2022/4/Zivotopis-PVDPA.pdf" target="_blank">Dana Macháčová</a></td>
but mostly I am failing with extraction. My goal is to have another txt file with extraxted clear link "/archiv/zivotopisy/2022/4/Zivotopis-PVDPA.pdf" without HTML syntaxes. Means strast with /archiv and ends with .pdf
I tried to explore for each method and regex, but not so lucky since I am begginer. I would be happy for any advice.