I am very new to web scraping and have been trying to use Selenium's functions to simulate a browser accessing the Texas public contracting webpage and then download embedded PDFs. The website is this: http://www.txsmartbuy.com/sp.
So far, I've successfully used Selenium to select an option in one of the dropdown menus "Agency Name" and to click the search button. I've listed my Python code below.
import os
os.chdir("/Users/fsouza/Desktop") #Setting up directory
from bs4 import BeautifulSoup #Downloading pertinent Python packages
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
chromedriver = "/Users/fsouza/Desktop/chromedriver" #Setting up Chrome driver
driver = webdriver.Chrome(executable_path=chromedriver)
driver.get("http://www.txsmartbuy.com/sp")
delay = 3 #Seconds
WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH, "//select[@id='agency-name-filter']/option[69]")))
health = driver.find_element_by_xpath("//select[@id='agency-name-filter']/option[68]")
health.click()
search = driver.find_element_by_id("spBtnSearch")
search.click()
Once I get to the results page, I get stuck.
First, I can't access any of the resulting links using the html page source. But if I manually inspect individual links in Chrome, I do find the pertinent tags (<a href...
) relating to individual results. I'm guessing this is because of JavaScript-rendered content.
Second, even if Selenium were able to see these individual tags, they have no class or id. The best way to call them, I think, would be by calling <a
tags by the order shown (see code below) but this didn't work either. Instead, the link calls some other 'visible' tag (something in the footer, which I don't need).
Third, assuming these things did work, how can I figure out the number of <a>
tags showing on the page (in order to loop this code over an over for every single result)?
driver.execute_script("document.getElementsByTagName('a')[27].click()")
I would appreciate your attention to this––and please excuse any stupidity on my part, considering that I'm just starting out.