I am using selenium webdriver to automate downloading several PDF files. I get the PDF preview window (see below), and now I would like to download the file. How can I accomplish this using Google Chrome as the browser?
Asked
Active
Viewed 6.3k times
6 Answers
42
Try this code, it worked for me.
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
"download.default_directory": "C:/Users/XXXX/Desktop", #Change default directory for downloads
"download.prompt_for_download": False, #To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
self.driver = webdriver.Chrome(options=options)

Celius Stingher
- 17,835
- 6
- 23
- 53

Kumar
- 421
- 4
- 2
-
-
2This didn't work for me until I changed the default directory to use backslash, so instead of "C:/Users/XXXX/Desktop" I use "C:\\Users\\XXXX\\Desktop". – Abang F. Dec 15 '21 at 02:45
-
3
-
Ordinal0 [0x00A75230+1856048] BaseThreadInitThunk [0x76FDFA29+25] RtlGetAppContainerNamedObjectPath [0x77A37B5E+286] RtlGetAppContainerNamedObjectPath [0x77A37B2E+238] – rsc05 Oct 06 '22 at 01:17
-
Confirming this works using [Splinter](https://splinter.readthedocs.io/en/latest/elements-in-the-page.html) (based on Selenium) which doesn't do file downloads. – Liquidgenius Dec 11 '22 at 23:30
5
I found this piece of code somewhere on Stackoverflow itself and it serves the purpose for me without having to use selenium at all.
import urllib.request
response = urllib.request.urlopen(URL)
file = open("FILENAME.pdf", 'wb')
file.write(response.read())
file.close()

Saravana
- 59
- 1
- 2
-
2This method will only work for non-authenticated sessions. It is not robust to websites which require a login. @Kumar's answer will work for both non-authenticated and authenticated sessions. – Liquidgenius Dec 11 '22 at 23:27
4
I did it and it worked, don't ask me how :)
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
#"download.default_directory": "C:/Users/517/Download", #Change default directory for downloads
#"download.prompt_for_download": False, #To auto download the file
#"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
driver = webdriver.Chrome(options=options)

Nick
- 138,499
- 22
- 57
- 95

user16072805
- 49
- 1
-
Ordinal0 [0x00A75230+1856048] BaseThreadInitThunk [0x76FDFA29+25] RtlGetAppContainerNamedObjectPath [0x77A37B5E+286] RtlGetAppContainerNamedObjectPath [0x77A37B2E+238] – rsc05 Oct 06 '22 at 01:18
3
You can download the pdf (Embeded pdf
& Normal pdf
) from web using selenium.
from selenium import webdriver
download_dir = "C:\\Users\\omprakashpk\\Documents" # for linux/*nix, download_dir="/usr/Public"
options = webdriver.ChromeOptions()
profile = {"plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}], # Disable Chrome's PDF Viewer
"download.default_directory": download_dir , "download.extensions_to_open": "applications/pdf"}
options.add_experimental_option("prefs", profile)
driver = webdriver.Chrome('C:\\chromedriver\\chromedriver_2_32.exe', chrome_options=options) # Optional argument, if not specified will search path.
driver.get(`pdf_url`)
It will download and save the pdf in directory specified. Change the download_dir
location and chrome driver location
as per your convenience.
You can download chrome driver from here.
Hope it helps!

Om Prakash
- 2,675
- 4
- 29
- 50
-
this works with gui, if I add `options.add_argument('headless') ` it doesn't work. Any idea why? – jaggi Feb 14 '18 at 11:31
-
Try `add_argument("--headless")`. It works with python3. I am sure, it will work for python 2 also. – Om Prakash Feb 14 '18 at 11:37
-
I'm also using python3. it might be working for other pdf links but for AWS S3 links, it's not working. eg:`http://spark-public.s3.amazonaws.com/nlp/slides/AdvancedMaxent.pdf `. Even wget doesn't for aws links. I'm not sure how aws checks you whether you are in gui mode or not. – jaggi Feb 14 '18 at 16:00
-
it seems that 'not allowing' file downloads in headless mode is a security feature https://bugs.chromium.org/p/chromium/issues/detail?id=696481#c39 – jaggi Mar 03 '18 at 06:26
-
@ Om Prakash, have you tested your code with mode of headless chrome? Because I tested the code from your github page in headless chrome and it didn't work. – exteral Jun 23 '18 at 09:11
-
-1
You can download the PDF file using Python's requests library
import requests
pdf_url = driver.current_url # Get Current URL
response = requests.get(pdf_url)
file_name = 'filename.pdf'
with open(file_name, 'wb') as f:
f.write(response.content)

Ravi Teja
- 45
- 1
- 10
-2
In My case it worked without any code modification,Just need to disabled the Chrome pdf viewer
Here are the steps to disable it
- Go into Chrome Settings
- Scroll to the bottom click on Advanced
- Under Privacy And Security - Click on "Site Settings"
- Scroll to PDF Documents
- Enable "Download PDF files instead of automatically opening them in Chrome"

Umer
- 1,098
- 13
- 31