how to save opened page as pdf in Selenium (Python)

Question

Have tried all the solutions I could find on the Internet to be able to print a page that is open in Selenium in Python. However, while the print pop-up shows up, after a second or two it goes away, with no PDF saved.

Here is the code being tried. Based on the code here - https://stackoverflow.com/a/43752129/3973491

Coding on a Mac with Mojave 10.14.5.

from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import WebDriverException
import time
import json

options = Options()
appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2
}

profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}
# profile = {'printing.print_preview_sticky_settings.appState':json.dumps(appState),'savefile.default_directory':downloadPath}
options.add_experimental_option('prefs', profile)
options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'

driver = webdriver.Chrome(options=options, executable_path=CHROMEDRIVER_PATH)
driver.implicitly_wait(5)
driver.get(url)
driver.execute_script('window.print();')

$chromedriver --v
ChromeDriver 75.0.3770.90 (a6dcaf7e3ec6f70a194cc25e8149475c6590e025-refs/branch-heads/3770@{#1003})

Any hints or solutions as to what can be done to print the open html page to a PDF. Have spent hours trying to make this work. Thank you!

Update on 2019-07-11:

My question has been identified as a duplicate, but a) the other question seems to be using javascript code, and b) the answer does not solve the problem being raised in this question - it may be to do with more recent software versions. Chrome version being used is Version 75.0.3770.100 (Official Build) (64-bit), and chromedriver is ChromeDriver 75.0.3770.90. On Mac OS Mojave. Script is running on Python 3.7.3.

Update on 2019-07-11:

Changed the code to

from selenium import webdriver
import json

chrome_options = webdriver.ChromeOptions()
settings = {
    "appState": {
        "recentDestinations": [{
            "id": "Save as PDF",
            "origin": "local",
            "account": "",
        }],
        "selectedDestinationId": "Save as PDF",
        "version": 2
    }
}
prefs = {'printing.print_preview_sticky_settings': json.dumps(settings)}
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=CHROMEDRIVER_PATH)
driver.get("https://google.com")
driver.execute_script('window.print();')
driver.quit()

And now, nothing happens. Chrome launches, loads url, print dialog appears but then nothing seems to happen - nothing in the default printer queue, and no pdf either - I even searched for the PDF files by looking up "Recent Files" on Mac.

*no PDF saved*, where did you check? It should be saved in your user Downloads folder. — Kamal, Jul 10 '19 at 07:01
@Kamal - I tried this again, and noticed that Chrome was firing an actual printout on my default printer but I was not in the same location, so I did not notice what actually happened. deleted the print queue from the numerous times that I had tried printing to pdf/ appeared that nothing happened. so I suspect that the "Save as PDF" option is not getting selected and do not know how to select it. — jim70, Jul 10 '19 at 11:57
Please refer to this [answer](https://stackoverflow.com/a/48798425/5319738). In your code, you are calling `webdriver.Chrome(options=options..`, but correct syntax is `webdriver.Chrome(chrome_options=options..`. And somehow, with `webdriver.ChromeOptions` print is working faster than with `webdriver.chrome.options.Options`, so I would suggest you to try that. — Kamal, Jul 11 '19 at 01:22
Possible duplicate of [Set Selenium ChromeDriver UserPreferences to Save as PDF](https://stackoverflow.com/questions/47007720/set-selenium-chromedriver-userpreferences-to-save-as-pdf) — Kamal, Jul 11 '19 at 01:23
@Kamal - Thank you for your comments. I just tried that also. changed the code to chrome_options = webdriver.ChromeOptions(). And indeed webdriver.ChromeOptions indeed seems to work faster, but even this option fires a printout to default printer and not to PDF :( Still looking for advise as to how this can be done - if not with Selenium then I wonder if it is possible with some other library. However, the page that I need to reach is after a login procedure. — jim70, Jul 11 '19 at 07:08
The code on other question works for me, so can you please update your question with latest code you tried? — Kamal, Jul 11 '19 at 08:26
Updated the question with the latest code that I used. This time nothing seems to go anywhere even though the print dialog does appear to launch. The print dialog is too quick and cannot read what printer or whether the PDF option is selected. Intrigued. Thanks @Kamal for staying engaged and helping me solve this. — jim70, Jul 11 '19 at 08:52
How do you mean solved? The only thing your script does for me is open "save as". It doesn't actually save it itself. — Greg W.F.R, Jul 18 '21 at 19:00
Oh sorry never mind. Instead of calling the correct chromedriver I used this. `driver = webdriver.Chrome(ChromeDriverManager().install())` but it ruined everything. now I explicitly used `driver = webdriver.Chrome(chrome_options=chrome_options , executable_path="/Applications/chrome/chromedriver")` and it works! — Greg W.F.R, Jul 18 '21 at 19:14
@GregW.F.R glad it worked. I have not used this in a long time. But yes that is the way to instantiate a chrome driver instance. — jim70, Jul 19 '21 at 20:33

score 25 · Answer 1 · answered Jul 18 '19 at 08:22

25

The answer here, worked when I did not have any other printer setup in my OS. But when I had another default printer, this did not work.

I don't understand how, but making small change this way seems to work.

from selenium import webdriver
import json

chrome_options = webdriver.ChromeOptions()
settings = {
       "recentDestinations": [{
            "id": "Save as PDF",
            "origin": "local",
            "account": "",
        }],
        "selectedDestinationId": "Save as PDF",
        "version": 2
    }
prefs = {'printing.print_preview_sticky_settings.appState': json.dumps(settings)}
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=CHROMEDRIVER_PATH)
driver.get("https://google.com")
driver.execute_script('window.print();')
driver.quit()

answered Jul 18 '19 at 08:22

Kamal

2,384
1
13
25

Thank you @Kamal. This approach indeed works but it printed to the last used printer. Just did some search and I wonder if cups-pdf installed as a printer and if cups-pdf is the last used printer can result in the desired outcome - print-to-pdf using python. – jim70 Jul 19 '19 at 06:36
Sorry I couldn't test my solution on Linux, it worked on Windows 10 for me. – Kamal Jul 19 '19 at 07:09
got it. Will work on this some more and see if I can come up with something. – jim70 Jul 19 '19 at 07:30
4

Worked on Linux for me. Would be nice if we could control the download location, however. – Rob Hall Jul 21 '20 at 13:09
1

@RobHall The solution https://stackoverflow.com/a/60548793/1485853 – iMath Apr 28 '21 at 07:19

score 9 · Answer 2 · answered Nov 25 '20 at 13:03

You can use the following code to print PDFs in A5 size with background css enabled:

import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import json
import time

chrome_options = webdriver.ChromeOptions()

settings = {
    "recentDestinations": [{
        "id": "Save as PDF",
        "origin": "local",
        "account": ""
    }],
    "selectedDestinationId": "Save as PDF",
    "version": 2,
    "isHeaderFooterEnabled": False,
    "mediaSize": {
        "height_microns": 210000,
        "name": "ISO_A5",
        "width_microns": 148000,
        "custom_display_name": "A5"
    },
    "customMargins": {},
    "marginsType": 2,
    "scaling": 175,
    "scalingType": 3,
    "scalingTypePdf": 3,
    "isCssBackgroundEnabled": True
}

mobile_emulation = { "deviceName": "Nexus 5" }
chrome_options.add_experimental_option("mobileEmulation", mobile_emulation)
chrome_options.add_argument('--enable-print-browser')
#chrome_options.add_argument('--headless')

prefs = {
    'printing.print_preview_sticky_settings.appState': json.dumps(settings),
    'savefile.default_directory': '<path>'
}
chrome_options.add_argument('--kiosk-printing')
chrome_options.add_experimental_option('prefs', prefs)

for dirpath, dirnames, filenames in os.walk('<source path>'):
    for fileName in filenames:
        print(fileName)
        driver = webdriver.Chrome("./chromedriver", options=chrome_options)
        driver.get(f'file://{os.path.join(dirpath, fileName)}')
        time.sleep(7)
        driver.execute_script('window.print();')
        driver.close()

This solution worked great for me. `savefile.default_directory` takes both forward and backslash paths (on Windows 10). However, this fails more often than it succeeds for me because the browser closes before the file is fully written. This can be solved by adding `sleep(5)` before `driver.close()` or some more intelligent structure. — Mark Tielemans, Mar 23 '21 at 23:15
It seems like headless is commented out, and with headless on it doesn't work. Any idea how to make it work in a headless browser? — user3691763, Oct 21 '22 at 08:01

Alex · Answer 3 · 2019-07-18T14:24:52.757

5

The solution is not very good, but you can take a screenshot and convert to pdf by Pillow...

from selenium import webdriver
from io import BytesIO
from PIL import Image

driver = webdriver.Chrome(executable_path='path to your driver')
driver.get('your url here')
img = Image.open(BytesIO(driver.find_element_by_tag_name('body').screenshot_as_png))
img.save('filename.pdf', "PDF", quality=100)

edited Jul 18 '19 at 14:24

answered Jul 18 '19 at 14:15

Alex

81
5

Thank you for your answer. The issue with this approach is that it does not work for multi-page webpages. Only a portion of information is captured. But it is a good solution for short pages and does not entail popups. – jim70 Jul 19 '19 at 06:11
what do you mean by **multi-page webpages**? – Alex Jul 19 '19 at 11:50
meaning web pages that need scrolling to see the complete webpage and when printed as PDF fit on 3-4 sheets of papers. – jim70 Jul 19 '19 at 15:38
you can use this code: https://stackoverflow.com/a/57608276/10661593 , and at the end save as pdf. P.s. I didn't understand a bit, sorry. Do you want to fit the entire page on 1 sheet when printing? or how – Alex Aug 22 '19 at 11:27
so what I ideally want to be able to do - is print a page as pdf. on a Mac, when you do that, the PDF generated can run into many pages - assuming PDF is created for letter or A4 sized printing. if I shrink the page a lot and take a screenshot that does not serve the purpose. although, now I understand that Selenium does not control the dialog boxes of the browser, and hence cannot print page as PDF. apparently, puppeteer or pyppeteer in python can do that but I do not know how to use that software yet. the link you shared, seems to talk about screenshot and not pdf... – jim70 Aug 22 '19 at 15:22
you can save screenshot as pdf, why not? – Alex Sep 04 '19 at 05:45
I can. But the page that I wanted to save runs into many screens of vertical scrolling. And so it would become multiple and page downs and then converting each screenshot to PDF and then combining the PDFs. Just thought of this based on your comment. Still seems rather kludgy, and I was hoping that there will be a better solution. Pyppeteer might allow me to do it in Python it seems, but I do not know how to use that. :( – jim70 Sep 04 '19 at 11:53
I think this would solve my problem. https://miyakogi.github.io/pyppeteer/_modules/pyppeteer/page.html#Page.pdf. However, I do not know async and await, and need to learn those before trying to use Pyppeteer. Just hard to believe that Selenium could not do it as I had sort of learnt it... – jim70 Sep 10 '19 at 14:27
I won't say I am upset. :) Selenium is free and thanks to the team for that! It is just that I was certain that it could be done and believed that I did not know the right syntax or options as to how to enable PDF printing in Selenium. Using kludges does not seem like the right thing. It will eventually break. – jim70 Sep 12 '19 at 12:29

score 5 · Answer 4 · answered Apr 03 '20 at 13:28

5

Here is the solution I use with Windows :

First download the ChromeDriver here : http://chromedriver.chromium.org/downloads and install Selenium

Then run this code (based on the accepted answer, slightly modified to work on Windows):

import json
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
settings = {"recentDestinations": [{"id": "Save as PDF", "origin": "local", "account": ""}], "selectedDestinationId": "Save as PDF", "version": 2}
prefs = {'printing.print_preview_sticky_settings.appState': json.dumps(settings)}
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
browser = webdriver.Chrome(r"chromedriver.exe", options=chrome_options)
browser.get("https://google.com/")
browser.execute_script('window.print();')
browser.close()

answered Apr 03 '20 at 13:28

Basj

41,386
99
383
673

1

This is such a minimal revision ("Per the selenium documentation, specify the windows driver locations (e.g., `chromedriver.exe`) rather than the linux driver locations when running on windows") that it should simply be a comment on the accepted answer. Furthermore, It appears that you simply [minified the accepted answer](https://codebeautify.org/python-formatter-beautifier/cbf1248f) to make the code look different. – Rob Hall Jul 21 '20 at 13:08
1

@RobHall Comments are sometimes cleared after years; also sometimes it's hard to extract information from multiple comments, thus this answer. I properly cited the source ("based on the accepted answer"); the devil is really in the details, I spent a lot of time trying and failing before it finally worked, so my goal was really to put a ready-to-use code for Windows as an answer. – Basj Jul 21 '20 at 13:53
I tried searching for the saved file but can't find it anywhere. Any idea where the file goes after being saved as pdf. – Raspberry Lemon Oct 08 '21 at 05:30
the saved file would be in downloads, does anyone know if I can add a delay for the web to load properly or if can change the default download location? – UserBlanko Nov 08 '21 at 03:30

score -5 · Answer 5 · answered Apr 29 '21 at 05:10

I would suggest Downloading the page source html which can be done like so in vb.net: Dim Html As String = webdriver.PageSource Not sure how it is done in python but I'm sure it's very similar Once you have done that then you can select the parts of the page you want to save using an html parser or by parsing it manually with string parsing code. Once you have the html for the part you want to save stored in a string then use an html to pdf converter library or program. There are lots of these for programming languages like C# and vb.net. I don't know about any for python but I'm sure some exist. Just do some research. (some are free and some are expensive)

I've been using the converter approach and it is not great. The most common converter, `wkhtmltopdf`, lives in the 13th century, so either you put your medieval armour, forget all about `flex` and `grid` and go back to `` layouting or you'll get zilch. Alternatives are even worse. Speaking of the 13th century, vb.net?!? In general, I don't hold a candle for two types of SO responses: 1) "Here's something I threw together and never actually tried. Good luck!", and 2) "Why would you want to do that?". Yours is type 1. Not as bad as type 2, but still a time waster. — Ricardo, May 13 '22 at 07:23

how to save opened page as pdf in Selenium (Python)

5 Answers5

Linked