Selenium find_elements By.XPATH trying to extract href urls error

Question

Using the Firefox webdriver, I want to extract all URLs from a href that contain a word. I'm using the latest selenium binary. Tried this:

driver = webdriver.Firefox()
driver.get(url)
nodes = driver.find_elements(By.XPATH, "//a[contains(@href,'products')]/@href")
print("nodes: ", nodes)
links = []
for elem in nodes:
    links.append(elem)

but get an type error:

selenium.common.exceptions.WebDriverException: Message: TypeError: Expected an element or WindowProxy, got: [object Attr href="https://www.example.com/catalogue/products/a.html"]

Also tried driver.find_elements(By.XPATH, "//a[contains(@href,'products')]") and then using getAttribute("href") for each one, but couldn't as well.

Don't understand where's the error and how to solve this.

Extract of the html:

<html>
  <body>
    <ul class="level2-megamenu">
      <li>
        <div class="level1-title">
          <a href="https://www.example.com/catalogue/products/a.html">
          <strong style="color:#828282;font-size:>Text</strong>                 
          </a>
        </div>
      </li>
    </ul>
  </body>
</html>

Update the question with relevant HTML. – undetected Selenium Sep 17 '20 at 13:27 — undetected Selenium, Sep 17 '20 at 13:27

undetected Selenium · Answer 1 · 2020-09-17T13:43:54.547

0

To extract the href attributes using Selenium and python you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR:

print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[href*='products']")))])

Using XPATH:

print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[contains(@href,'products')]")))])

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

edited Sep 17 '20 at 13:43

answered Sep 17 '20 at 13:30

undetected Selenium

183,867
41
278
352

That line (XPATH), unfortunately, returned an error with no message: `raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message:` . The page seems to have fully loaded, though. – Luis Sep 17 '20 at 14:27
This is already solved, with these two lines: `wait = WebDriverWait(driver, 10) nodes = wait.until(lambda driver: driver.find_elements_by_xpath("//a[contains(@href,'products')]"))`. The method nomenclature is different. Maybe you could care to explain why, what's the difference. Thank you anyway for your help. – Luis Sep 17 '20 at 15:14

Selenium find_elements By.XPATH trying to extract href urls error

1 Answers1