Python Selenium get all urls extensions on a webpage as a list

Question

I am trying to get a list of all the text related to the URL. in the webpage The code in the Ctrl+Shift+I gives me this:

<sr-cell-name name="Otto Kraf" url="/ark:/61903/1:1:Q247-6VCC" relationship="Principal" collection-name="New York, New York City, Police Census, 1890"></sr-cell-name>

There is a list of 20 names such as Otto Kraf on the page, and I want to make a list of the URLs that relate to each name.

I tried driver_get_element_by(xpath, name, css_selecter, and xpath) but none seem to find the URL.

Snapshot of the HTML:

updated html

nested shadow roots

error

Hi, and welcome to Stack Overflow! I don't have knowledge about the area you're asking about, but I would recommend, if you can, copy/pasting your code into a code block instead of linking a screenshot. This may increase the chance of people responding. — Phoenix, Jun 09 '20 at 05:01

undetected Selenium · Answer 1 · 2020-06-09T04:48:23.040

To get a list of all the urls within the webpage in the webpage using Selenium you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR:

print([my_elem.get_attribute("url") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.table.table-element-table span.td[name='name'] sr-cell-name[name][url]")))])

Using XPATH:

print([my_elem.get_attribute("url") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='table table-element-table']//span[@class='td' and @name='name']//sr-cell-name[@name and @url]")))])

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Update

The elements seems to be within #shadow-root (open). You can find a couple of relevant discussions on how to access the elements within #shadow-root (open) in:

Thank you for your help, however, when I run the code, I receive the error that I updated in my post. This may be because of the nested shadow roots that I had to go through, and I updated the html code and my code to show how I dealt with the shadow roots. Thank you again. — Jared Willamson, Jun 09 '20 at 04:34
@JaredWillamson Checkout the answer update and let me know the status. — undetected Selenium, Jun 09 '20 at 04:48
Thank you for your help. I am able to access the shadow root where the urls are located, I can even find the class table.table-element-table which they are under. I use shadow_root4.find_element_by.... and am able to find elements within the shadow root where the urls are located. Is there anyway to modify your code in your original answer using my shadow_root4. to make a list of the urls? — Jared Willamson, Jun 09 '20 at 22:10

Python Selenium get all urls extensions on a webpage as a list

1 Answers1

Update