1

My ultimate goal is to dump the full html of any page. "Full" means the original source html and dynamic htmls. Chrome Devtool is already doing this but I need it in a programmatical way, in Selenium Python.

I can locate all iframes using xpath //iframe. I'd like to find a way to locate all shadow roots too. I have read some good Stack Overflow posts, like this one how-to-identify-shadow-dom. But they all assumed that the location of shadow root was already known, which is not my case.

oldpride
  • 761
  • 7
  • 15

1 Answers1

1

Your question lacks a minimal reproducible example. Nonetheless (in hope that your next question will contain such example, and be up to SOF standards), here is one way of finding all shadow roots in a page:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchShadowRootException

import time as t
import pandas as pd


chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")

webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
actions = ActionChains(browser)
wait = WebDriverWait(browser, 20)
url = 'https://iltacon2022.expofp.com/'
browser.get(url) 

all_elements = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//*')))
for el in all_elements:
    try:
        if el.shadow_root:
            print('found shadow root in', el.get_attribute('outerHTML'))
    except NoSuchShadowRootException:
        print('no shaddow root')

This is just one way hastily put together, to locate all eventual shadow roots in a page. The selenium setup is on linux/chromedriver. Note that for other browsers/drivers, like gecko/Firefox, you will need a different method to locate the shadow root. Lastly, Selenium docs can be found at https://www.selenium.dev/documentation/

Barry the Platipus
  • 9,594
  • 2
  • 6
  • 30