Selenium Webdriver: How to Download a PDF File with Python?

Question

I am using selenium webdriver to automate downloading several PDF files. I get the PDF preview window (see below), and now I would like to download the file. How can I accomplish this using Google Chrome as the browser?

Take a look at [this answer](https://stackoverflow.com/a/43471196/3846228)... maybe it'll help you. — dot.Py, Jul 27 '17 at 11:35

score 42 · Answer 1 · edited Aug 08 '22 at 22:11

42

Try this code, it worked for me.

options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
"download.default_directory": "C:/Users/XXXX/Desktop", #Change default directory for downloads
"download.prompt_for_download": False, #To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
self.driver = webdriver.Chrome(options=options)

edited Aug 08 '22 at 22:11

Celius Stingher

17,835
6
23
53

answered Jan 29 '19 at 18:12

Kumar

421
4
2

Thanks! great answer! – Dimitris Tsarouhas Mar 13 '21 at 18:42
2

This didn't work for me until I changed the default directory to use backslash, so instead of "C:/Users/XXXX/Desktop" I use "C:\\Users\\XXXX\\Desktop". – Abang F. Dec 15 '21 at 02:45
3

What is `download.directory_upgrade` for? – Nam G VU Jul 21 '22 at 04:13
Ordinal0 [0x00A75230+1856048] BaseThreadInitThunk [0x76FDFA29+25] RtlGetAppContainerNamedObjectPath [0x77A37B5E+286] RtlGetAppContainerNamedObjectPath [0x77A37B2E+238] – rsc05 Oct 06 '22 at 01:17
Confirming this works using [Splinter](https://splinter.readthedocs.io/en/latest/elements-in-the-page.html) (based on Selenium) which doesn't do file downloads. – Liquidgenius Dec 11 '22 at 23:30

score 5 · Answer 2 · answered Jun 18 '21 at 02:39

5

I found this piece of code somewhere on Stackoverflow itself and it serves the purpose for me without having to use selenium at all.

import urllib.request

response = urllib.request.urlopen(URL)    
file = open("FILENAME.pdf", 'wb')
file.write(response.read())
file.close()

answered Jun 18 '21 at 02:39

Saravana

59
1
2

2

This method will only work for non-authenticated sessions. It is not robust to websites which require a login. @Kumar's answer will work for both non-authenticated and authenticated sessions. – Liquidgenius Dec 11 '22 at 23:27

score 4 · Answer 3 · edited May 30 '21 at 01:43

4

I did it and it worked, don't ask me how :)

options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
#"download.default_directory": "C:/Users/517/Download", #Change default directory for downloads
#"download.prompt_for_download": False, #To auto download the file
#"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome 
})
driver = webdriver.Chrome(options=options)

edited May 30 '21 at 01:43

Nick

138,499
22
57
95

answered May 30 '21 at 01:39

user16072805

49
1

Ordinal0 [0x00A75230+1856048] BaseThreadInitThunk [0x76FDFA29+25] RtlGetAppContainerNamedObjectPath [0x77A37B5E+286] RtlGetAppContainerNamedObjectPath [0x77A37B2E+238] – rsc05 Oct 06 '22 at 01:18

Om Prakash · Answer 4 · 2018-02-09T12:13:56.577

3

You can download the pdf (Embeded pdf & Normal pdf) from web using selenium.

from selenium import webdriver

download_dir = "C:\\Users\\omprakashpk\\Documents" # for linux/*nix, download_dir="/usr/Public"
options = webdriver.ChromeOptions()

profile = {"plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}], # Disable Chrome's PDF Viewer
               "download.default_directory": download_dir , "download.extensions_to_open": "applications/pdf"}
options.add_experimental_option("prefs", profile)
driver = webdriver.Chrome('C:\\chromedriver\\chromedriver_2_32.exe', chrome_options=options)  # Optional argument, if not specified will search path.

driver.get(`pdf_url`)

It will download and save the pdf in directory specified. Change the download_dir location and chrome driver location as per your convenience.

You can download chrome driver from here.

Hope it helps!

edited Feb 09 '18 at 12:13

answered Feb 09 '18 at 11:50

Om Prakash

2,675
4
29
50

this works with gui, if I add `options.add_argument('headless') ` it doesn't work. Any idea why? – jaggi Feb 14 '18 at 11:31
Try `add_argument("--headless")`. It works with python3. I am sure, it will work for python 2 also. – Om Prakash Feb 14 '18 at 11:37
I'm also using python3. it might be working for other pdf links but for AWS S3 links, it's not working. eg:`http://spark-public.s3.amazonaws.com/nlp/slides/AdvancedMaxent.pdf `. Even wget doesn't for aws links. I'm not sure how aws checks you whether you are in gui mode or not. – jaggi Feb 14 '18 at 16:00
it seems that 'not allowing' file downloads in headless mode is a security feature https://bugs.chromium.org/p/chromium/issues/detail?id=696481#c39 – jaggi Mar 03 '18 at 06:26
@ Om Prakash， have you tested your code with mode of headless chrome? Because I tested the code from your github page in headless chrome and it didn't work. – exteral Jun 23 '18 at 09:11
@Om Prakash, If I would like to download an xml? – Mar 29 '19 at 12:36

score -1 · Answer 5 · answered Feb 17 '23 at 07:05

You can download the PDF file using Python's requests library

import requests
pdf_url = driver.current_url       # Get Current URL
response = requests.get(pdf_url)
file_name = 'filename.pdf'
with open(file_name, 'wb') as f:
   f.write(response.content)

score -2 · Answer 6 · answered May 20 '20 at 18:18

In My case it worked without any code modification,Just need to disabled the Chrome pdf viewer

Here are the steps to disable it

Go into Chrome Settings
Scroll to the bottom click on Advanced
Under Privacy And Security - Click on "Site Settings"
Scroll to PDF Documents
Enable "Download PDF files instead of automatically opening them in Chrome"

Selenium Webdriver: How to Download a PDF File with Python?

6 Answers6

Linked

Related