In python3 and selenium I want to capture PDFs file links from one page. In Inspect Element I didn't find these links, it seems that they are generated
So on the site I looked for the exact location, the "Documentos" links box - in it there is a list of links (Certidão), when you click it opens a new tab with the PDF - example
I then made the script below that looks for the XPATH elements in the PDFs links box and then calls a function that should look for the exact attributes of the links
But it's not working. Please does anyone know what I could do to fix this or another method?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
site = "https://divulgacandcontas.tse.jus.br/divulga/#/candidato/2022/2040602022/AP/30001653385"
# Function to get the links with attribute
def find(elem):
element = elem.get_attribute("dvg-link-doc dvg-certidao")
if element:
return element
else:
return False
driver = webdriver.Chrome('D:\Code\chromedriver.exe')
driver.get(site)
documentss = []
# Look for the elements in the box where the PDFs are
elems = driver.find_elements("xpath", '/html/body/div[2]/div[1]/div/div[1]/section[3]/div/div[3]/div[2]/div/div/ul')
# Iterate over the elements found
for elem in elems:
# Test if there is a link available
try:
links = WebDriverWait(elem, 2).until(find)
print(links)
if links.endswith(".pdf"):
print(links)
dicionario = {"link": links}
documents.append(dicionario)
except:
continue