1

While yesterday I could easily navigate on archive.org with selenium, today I cannot use selenium functions on the website in any way. Even my code to click on a simple search button does not work. Is there any solution for this?

I used import undetected_chromedriver but it didn't work, I also tried playwright library alternative to selenium but it doesn't work.

from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import undetected_chromedriver as uc

chrome_driver_path = "chromedriver"

keyword = "photo"
url_photo = f"https://archive.org/search?query={keyword}&and%5B%5D=mediatype%3A%22image%22"

chrome_options = Options()
# chrome_options.add_argument('--headless')
service = Service('chromedriver')
options = webdriver.ChromeOptions()

options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")

driver = uc.Chrome(options=options)
driver.get(url_photo)
WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH
                                                             ,
                                                             "/html/body/app-root//main/div/router-slot/search-page//div/div[2]/collection-browser//div/div[3]/infinite-scroller//section/article[1]/tile-dispatcher//div/a/item-tile//div/div/div/image-block//div/item-image//div/img"))).click()
print("request successful")
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
xtrabyte
  • 23
  • 5
  • There are a lot of shadow roots in that page. See [answer to this question](https://stackoverflow.com/questions/73377520/webscraping-shadow-root) as a starting point, and forget about long, fragile XPATH locators. – Barry the Platipus Jul 21 '23 at 11:44

1 Answers1

0

The Search field within the website https://archive.org/search?query=photo&and%5B%5D=mediatype%3A%22image%22 is located deep within multiple #shadow-root (open) elements.

search


Solution

To send a character sequence to the Search field you have to use shadowRoot.querySelector() and you can use the following locator strategies:

  • Code Block:

    driver.get("https://archive.org/search?query=photo&and%5B%5D=mediatype%3A%22image%22")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((driver.execute_script("return document.querySelector('app-root').shadowRoot.querySelector('search-page').shadowRoot.querySelector('collection-search-input').shadowRoot.querySelector('ia-clearable-text-input').shadowRoot.querySelector('input#text-input')")))).send_keys("xtrabyte")
    
  • Browser Snapshot:

xtrabyte


References

You can find a couple of relevant discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thank you very much, sir, I appreciate it. – xtrabyte Jul 22 '23 at 13:20
  • 1
    @xtrabyte Glad to be able to help you. [What should I do when someone answers my question?](https://stackoverflow.com/help/someone-answers) – undetected Selenium Jul 22 '23 at 13:21
  • Sir, when I want to write a selenium script that will click on any image, I need to reach about 9 shadow root, I could not write this path. Is there an easier way to get this path or a tactic to write it? – xtrabyte Jul 24 '23 at 15:27
  • @xtrabyte Yes, there are. Let's discuss the issue in [Selenium](https://chat.stackoverflow.com/rooms/223360/selenium) room. – undetected Selenium Jul 24 '23 at 17:47