0

I am trying to get a list of all the text related to the URL. in the webpage The code in the Ctrl+Shift+I gives me this:

<sr-cell-name name="Otto Kraf" url="/ark:/61903/1:1:Q247-6VCC" relationship="Principal" collection-name="New York, New York City, Police Census, 1890"></sr-cell-name>

There is a list of 20 names such as Otto Kraf on the page, and I want to make a list of the URLs that relate to each name.

I tried driver_get_element_by(xpath, name, css_selecter, and xpath) but none seem to find the URL.

Snapshot of the HTML:

updated html

nested shadow roots

error

Phoenix
  • 1,553
  • 2
  • 13
  • 29
  • Hi, and welcome to Stack Overflow! I don't have knowledge about the area you're asking about, but I would recommend, if you can, copy/pasting your code into a code block instead of linking a screenshot. This may increase the chance of people responding. – Phoenix Jun 09 '20 at 05:01

1 Answers1

1

To get a list of all the urls within the webpage in the webpage using Selenium you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    print([my_elem.get_attribute("url") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.table.table-element-table span.td[name='name'] sr-cell-name[name][url]")))])
    
  • Using XPATH:

    print([my_elem.get_attribute("url") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='table table-element-table']//span[@class='td' and @name='name']//sr-cell-name[@name and @url]")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Update

The elements seems to be within #shadow-root (open). You can find a couple of relevant discussions on how to access the elements within #shadow-root (open) in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 1
    Thank you for your help, however, when I run the code, I receive the error that I updated in my post. This may be because of the nested shadow roots that I had to go through, and I updated the html code and my code to show how I dealt with the shadow roots. Thank you again. – Jared Willamson Jun 09 '20 at 04:34
  • @JaredWillamson Checkout the answer update and let me know the status. – undetected Selenium Jun 09 '20 at 04:48
  • 1
    Thank you for your help. I am able to access the shadow root where the urls are located, I can even find the class table.table-element-table which they are under. I use shadow_root4.find_element_by.... and am able to find elements within the shadow root where the urls are located. Is there anyway to modify your code in your original answer using my shadow_root4. to make a list of the urls? – Jared Willamson Jun 09 '20 at 22:10