2

I'm trying to make a web scraper that downloads an image that's inside of an iframe with a child.

I can't get Selenium for Chrome to find the correct iframe to switch into. The main issue is the iframe in question doesn't have a name or id so I searched by index. I managed to get inside of the parent, but I can't get inside of the sub-child. If I set the index to 1 I get the next iframe in the outermost scope.

From looking into my webdriver object I think the search is limited to Red Rectangle, as thats what's inside the page source attribute of my var "driver".

The Object I want to reach is the img with the id pbk-page in the Green Rectangle enter image description here My code so far just gets the url then waits for the page to load using sleep (once I can navigate to the correct element I'll implement WebDriverWait). This is the navigation bit of code:

driver.switch_to.frame(0)
Image_link = driver.find_element(By.ID,'pbk-page')

Oh! I'm using python

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Watson221
  • 73
  • 7

3 Answers3

2

I was stuck doing the exact same thing you were (maybe even scraping the same website?), and this is what worked for me:

My solution:

iframe1 = driver.find_elements(By.XPATH, value="//iframe")[0]
driver.switch_to.frame(iframe1)
iframe2 = driver.execute_script("return document.querySelector(\"body > mosaic-book\").shadowRoot.querySelector(\"iframe\")")
driver.switch_to.frame(iframe2)

img = driver.find_elements(By.ID, value="pbk-page")

I am very much an amateur at using Selenium, but this is my best understanding of how this works: First, we're able to find the parent iframe iframe1, but our driver can't see anything inside of the shadow DOM. However, we can access inside of the shadow DOM using javascript, so starting from the iframe, we can find the shadow host element mosaic-book, enter the shadow DOM, and return/pass out the child iframe iframe2. Then we can switch our driver into this iframe2 and access the image.

There very well might be a more elegant way to do this, but this is what worked for me.

Reading I did to come up with this solution:

  1. What even is a shadow root/shadow DOM or whatever??? https://www.geeksforgeeks.org/what-is-shadow-root-and-how-to-use-it/
  2. How to access elements inside of a shadow DOM using javascript: https://www.youtube.com/watch?v=PQcRaIoc2AM
whe21405
  • 21
  • 1
0

Like any other element iframe can be located by XPath or CSS Selector. They can use any attribute value making that locator unique. I believe here you could uniquely locate both the iframes by their src value, but since you marked them out I can't see their values.

Prophet
  • 32,350
  • 22
  • 54
  • 79
  • Hi, I cant search for the child by using CSS Selector or Xpath. The scope of the parent's page source is limited to the red square and doesn't include the child as far as I can tell. Any attempts result in a NoSuchElementException – Watson221 Aug 24 '22 at 21:35
0

As per the given HTML:

iframe_shadowroot_iframe

The desired element:

<img id="pbk-page"....>

is within the child <iframe> which is within a #shadow-root (open) marked with a blue rectangle.


Solution

To access the desired element you need to:

  • Switch within the parent iframe
  • Switch within the shadow-root
  • Switch within the child iframe
  • Then locate the element

Effectively, your code block will be:

WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"parent_iframe_css_selector")))
shadow_host = driver.find_element(By.CSS_SELECTOR, 'mosaic-book')
shadow_root = shadow_host.shadow_root
shadow_content = shadow_root.find_element(By.CSS_SELECTOR, 'child_iframe_css_selector')
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"child_iframe_css_selector")))
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//img[@id='pbk-page']")))
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Hi, I tried that snippet out and I'm still getting a TimeoutException on the WebDriverWait and NoSuchElementException when I try to find something. I right clicked to get the CSS selector in chrome and used that to replace your parent/child iframe css selector and mosaic-book elements. – Watson221 Aug 24 '22 at 21:29