-1

I'm accessing this url (https://cissearch.kcc.gov.tw/System/Bulletin/View.aspx?BulletinSN=239928&pages=9957#pdfStart) with selenium in python, and I'm trying to click on the download button in the pdfviewer.

I've tried adding options as suggested here: Selenium Webdriver: How to Download a PDF File with Python?

But I end up with a page with an open button that I still have to manually click to download the pdf file.

I've also tried this route https://www.lambdatest.com/blog/shadow-dom-in-selenium but I was unable to locate any element.

So I'm trying to click on the button with this:

driver.execute_script("document.querySelector('pdf-viewer').shadowRoot.querySelector('viewer-toolbar').shadowRoot.querySelector('viewer-download-controls').shadowRoot.querySelector('cr-action-menu').querySelector('button')")

This javascript works in the devtools console as shown in this image:

enter image description here

but it returns this error when I run it in python

JavascriptException: Message: javascript error: Cannot read properties of null (reading 'shadowRoot')
  (Session info: chrome=109.0.5414.87)
Hackore
  • 163
  • 1
  • 12
  • Have you googled something like "selenium shadowroot"? Top result for me is [How To Automate Shadow DOM In Selenium WebDriver](https://www.lambdatest.com/blog/shadow-dom-in-selenium/) which talks about [`getShadowRoot()`](https://www.selenium.dev/selenium/docs/api/java/org/openqa/selenium/WebElement.html#getShadowRoot()). – Ouroborus Jan 19 '23 at 16:45
  • Yes I have tried that route. I am unable to locate any element in the url I posted in the question. – Hackore Jan 19 '23 at 17:20
  • Part of asking is showing what you tried. Edit your question to also include your attempts using `getShadowRoot`. – Ouroborus Jan 19 '23 at 17:31
  • Also, you won't be able to access any shadow DOM using `execute_script` since this runs the javascript in the page's context and the page's javascript can't access shadow DOMs. – Ouroborus Jan 19 '23 at 17:33
  • if you are able to open the viewer page, you can have this to click the download button `driver.execute_script("document.getElementById("download").click()")` – simpleApp Jan 19 '23 at 17:48

1 Answers1

0

The issue is that the url does not actually point to a pdf file. It's a webpage with an embedded pdf viewer.

By running this I was able to get the actual url to the pdf file:

pdf_url = driver.find_element_by_tag_name('iframe').get_attribute("src")

and then I can just download it with this

import urllib.request
urllib.request.urlretrieve(pdf_url, "test.pdf")
Hackore
  • 163
  • 1
  • 12