Selenium 4 + geckodriver: printing html5 webpage to PDF with Page.printToPDF

Question

With Selenium 4 and chromedriver, I succeeded printing websites to PDF with custom page sizes (see Python code below). I would like to know the equivalent to do this with geckodriver/firefox.

def send_devtools(driver, cmd, params={}):
resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id
url = driver.command_executor._url + resource
body = json.dumps({'cmd': cmd, 'params': params})
response = driver.command_executor._request('POST', url, body)
if (response.get('value') is not None):
    return response.get('value')
else:
    return None

def save_as_pdf(driver, path, options={}):
    result = send_devtools(driver, "Page.printToPDF", options)
    if (result is not None):
        with open(path, 'wb') as file:
        file.write(base64.b64decode(result['data']))
        return True
    else:
        return False

options = webdriver.ChromeOptions()
# headless setting is mandatory, otherwise saving tp pdf won't work
options.add_argument("--headless")

driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver', options=options)
# chrome has to operate in headless mode to procuce PDF
driver.get(r'https://example.my')

send_devtools(driver, "Emulation.setEmulatedMedia", {'media': 'screen'})
pdf_options = { 'paperHeight': 92, 'paperWidth': 8, 'printBackground': True }
save_as_pdf(driver, 'myfilename.pdf', pdf_options)

https://github.com/mozilla/geckodriver/issues/1800 Followed by https://github.com/SeleniumHQ/selenium/issues/8802 Not sure if you can try POST Method option though! — pvy4917, Sep 19 '21 at 09:11

score 0 · Answer 1 · answered Sep 18 '21 at 13:23

Did you try wkhtmltopdf?

wkhtmltopdf and wkhtmltoimage are open source (LGPLv3) command line tools to render HTML into PDF and various image formats using the Qt WebKit rendering engine. These run entirely "headless" and do not require a display or display service.

Example usage:

wkhtmltopdf http://google.com google.pdf

If you want to do it with python, after installation you can invoke with:

import os

number = iter(range(100))

def html_to_pdf(link, name="test"):
    if os.path.isfile(name): # same file name
        name = name[:-1] + str(next(number))
    os.system(f"wkhtmltopdf {link} {name}.pdf")

Additionally you can use subprocess.run if you want to use wkhtmltopdf with more parameters. Your html_to_pdf method will gain more effective with more parameters. You can checkout documentation with:

wkhtmltopdf -H

I tried wkhtmltppdf but it doesn't support modern HTML5 markup like flexbox etc. The results weren't convincing in this regard. — Madamadam, Sep 21 '21 at 14:21
Your question should include HTML5 support because wkhtmltopdf is seems good answer without your requirements. — Baris Senyerli, Sep 21 '21 at 14:33

score 0 · Answer 2 · answered Sep 21 '21 at 06:23

0

To print a page as PDF there is a specific WebDriver command that can be used for cross-browser automation. That means that there is no need to write custom code, which utilizes the Chrome DevTools protocol, as done above for Chrome.

For both Chrome and Firefox this command is already available in Selenium 3.141, and should also work without modifications for Selenium 4.

The command will return the base64 encoded PDF data in the response's payload, and would require you to save it to a file yourself.

answered Sep 21 '21 at 06:23

Henrik

249
1
6

I'm using Selenium 3.141 and don't have this command available. – Michael Herrmann Sep 27 '21 at 07:27
2

Turns out that some Selenium bindings don't have it added for the 3.x release. As such you would have to install a recent Selenium 4 beta, or wait for it's final release. For more details see https://github.com/SeleniumHQ/selenium/issues/8802. – Henrik Sep 27 '21 at 10:46

score -1 · Answer 3 · answered Sep 16 '21 at 17:17

Issue:

To proceed with the same task using Firefox or Geckodriver, it apparently has some issues with the mentioned code for writing to the file, resulting in not saving the target document.

Solution:

So I tweaked around the code, which now opens the website using Geckdriver on Firefox and takes a screenshot for the body elements using the function find_element_by_tag_name(), which is later on converted to RGB mode, with the dimensions of the screenshot and later saved as a PDF document using Pillow

Code:

from PIL import Image
from io import BytesIO
from selenium import webdriver

driverOptions = webdriver.FirefoxOptions()
# Uncomment the below line and change the path according to your configurations if you encounter an error like "Expected browser binary location ..."
# driverOptions.binary_location = '/Applications/Firefox.app/Contents/MacOS/firefox'
driverOptions.add_argument("--headless")
webDriver = webdriver.Firefox(executable_path = '/usr/local/bin/geckodriver', options = driverOptions)
webDriver.get(f'https://stackoverflow.com')
websiteScreenshot = Image.open(BytesIO(webDriver.find_element_by_tag_name('body').screenshot_as_png))
rgbImage = Image.new('RGB', websiteScreenshot.size, (255, 255, 255))
rgbImage.paste(websiteScreenshot, mask=websiteScreenshot.split()[3])
rgbImage.save('fileName.pdf', "PDF", resolution=100)
webDriver.quit()

References:

Additional:

You can download the Geckodriver for Firefox based on your configurations from here, happy coding!

Sorry, but the whole question is about using the native browser functions, that print PDF files, because I need real vector PDFs; a screenshot converted to PDF isn't what I am looking for. — Madamadam, Sep 16 '21 at 17:23