I am trying to scrape external data to pre-fill form data on a website. The aim is to find a keyword, and return the class name of the element that contains that keyword. I have the constraints of not knowing if the website does have the keyword or what type of tag the keyword is within.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
chromeDriverPath = "./chromedriver"
chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome(chromeDriverPath, options=options)
driver.get("https://www.scrapethissite.com/pages/")
#keywords to scrape for
listOfKeywords = ['ajax', 'click']
for keyword in listOfKeywords:
try:
foundKeyword = driver.find_element(By.XPATH, "//*[contains(text(), " + keyword + ")]")
print(foundKeyword.get_attribute("class"))
except:
pass
driver.close()
This example returns the highest parent, not the immediate parent. To elaborate this example prints "" because it is trying to return the class attribute for the <html>
tag which does not have a class attribute. Similarly if I changed the code to search for the keyword in a <div>
foundKeyword = driver.find_element(By.XPATH, "//div[contains(text(), " + keyword + ")]")
This prints "container", for both 'ajax' and 'click' because the div class='container'
wraps everything on the website.
So the answer I want for the above example is, for the keyword 'ajax', it should print 'page-title' (the class of the immediate parent tag). Similarly, for 'click', I would expect it to print 'lead session-desc'.
The below image may help to visualize this
tag that wraps it, shown in red in the image. Similarly for the keyword 'click', I would like the class of the
– Nick Nov 11 '21 at 06:09