Selenium print PDF in A4 format

Question

I have the following code for printing to PDF (and it works), and I am using only Google Chrome for printing.

def send_devtools(driver, command, params=None):
    # pylint: disable=protected-access
    if params is None:
        params = {}
    resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id
    url = driver.command_executor._url + resource
    body = json.dumps({"cmd": command, "params": params})
    resp = driver.command_executor._request("POST", url, body)
    return resp.get("value")


def export_pdf(driver):
    command = "Page.printToPDF"
    params = {"format": "A4"}
    result = send_devtools(driver, command, params)
    data = result.get("data")
    return data

As we can see, I am using Page.printToPDF to print to base64, and passing "A4" as format on params paramenter.

Unfortunately this parameter seems to be being ignored. I saw some code using puppeteer using it (format A4) and I thought that could help me.

Even with hardcoded width and height (see bellow) I have no luck.

"paperWidth": 8.27,  # inches
"paperHeight": 11.69,  # inches

Using the code above, is it possible to set the page to A4 format?

After doing a lot more research, I found a way to achieve your objective. I posted an updated answer with a working example how to accomplish your use case. Please let me know how it works for you. — Life is complex, Jul 18 '21 at 00:04

Life is complex · Accepted Answer · 2021-07-17T15:41:25.653

UPDATED POST 07-17-2021

I decided to verify the output of my original code using the Python package pdfminer.sixth

from pdfminer.pdfpage import PDFPage
from pdfminer.pdfpage import PDFParser
from pdfminer.pdfpage import PDFDocument

parser = PDFParser(open('test_1.pdf', 'rb'))
doc = PDFDocument(parser)
pageSizesList = []
for page in PDFPage.create_pages(doc):
    print(page.mediabox)
    # output
    [0, 0, 612, 792]

I was shocked when I converted these point sizes to inches. The size was 8.5 x 11, which doesn’t equal the A4 paper size of 8.27 x 11.69. When I saw this I decided to explore this issue more, by looking through the chromium and selenium source code.

Within the chromium source code the command Page.printToPDF is located in the file page_handler.cc

void PageHandler::PrintToPDF(Maybe<bool> landscape,
                             Maybe<bool> display_header_footer,
                             Maybe<bool> print_background,
                             Maybe<double> scale,
                             Maybe<double> paper_width,
                             Maybe<double> paper_height,
                             Maybe<double> margin_top,
                             Maybe<double> margin_bottom,
                             Maybe<double> margin_left,
                             Maybe<double> margin_right,
                             Maybe<String> page_ranges,
                             Maybe<bool> ignore_invalid_page_ranges,
                             Maybe<String> header_template,
                             Maybe<String> footer_template,
                             Maybe<bool> prefer_css_page_size,
                             Maybe<String> transfer_mode,
                             std::unique_ptr<PrintToPDFCallback> callback)

This function allows the parameters paper_width and paper_height to be modified. These parameters take a double. A C++ double is a versatile data type that is used internally for the compiler to define and hold any numerically valued data type especially any decimal oriented value. C++ double data type can be either fractional as well as whole numbers with values.

These parameters have default values, which are defined in the Chrome DevTools Protocol:

paperWidth: Paper width in inches. Defaults to 8.5 inches.
paperHeight: Paper height in inches. Defaults to 11 inches

Note the discrepancy between the format of the parameters between chromium source code and the Chrome DevTools Protocol details.

paper_width in the chromium source code
paperWidth in the Chrome DevTools Protocol

According to the chromium source code the command Page.printToPDF is called with SendCommandAndGetResultWithTimeout.

Status WebViewImpl::PrintToPDF(const base::DictionaryValue& params,
                               std::string* pdf) {
  // https://bugs.chromium.org/p/chromedriver/issues/detail?id=3517
  if (!browser_info_->is_headless) {
    return Status(kUnknownError,
                  "PrintToPDF is only supported in headless mode");
  }
  std::unique_ptr<base::DictionaryValue> result;
  Timeout timeout(base::TimeDelta::FromSeconds(10));
  Status status = client_->SendCommandAndGetResultWithTimeout(
      "Page.printToPDF", params, &timeout, &result);
  if (status.IsError()) {
    if (status.code() == kUnknownError) {
      return Status(kInvalidArgument, status);
    }
    return status;
  }
  if (!result->GetString("data", pdf))
    return Status(kUnknownError, "expected string 'data' in response");
  return Status(kOk);
}

In my original answer I used send_command_and_get_result, which is similar to the command SendCommandAndGetResultWithTimeout.

# stub_devtools_client.h
 
Status SendCommandAndGetResult(
     const std::string& method,
     const base::DictionaryValue& params,
     std::unique_ptr<base::DictionaryValue>* result) override;

Status SendCommandAndGetResultWithTimeout(
     const std::string& method,
     const base::DictionaryValue& params,
     const Timeout* timeout,
     std::unique_ptr<base::DictionaryValue>* result) override;

After looking at the selenium source code it 's unclear how to correctly pass the commands send_command_and_get_result or send_command_and_get_result_with_timeout.

I did note this function in the webdriver selenium source code:

def execute_cdp_cmd(self, cmd, cmd_args):
     """
     Execute Chrome Devtools Protocol command and get returned result

     The command and command args should follow chrome devtools protocol domains/commands, refer to link
     https://chromedevtools.github.io/devtools-protocol/

     :Args:
      - cmd: A str, command name
      - cmd_args: A dict, command args. empty dict {} if there is no command args

     :Usage:
         driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': requestId})

     :Returns:
         A dict, empty dict {} if there is no result to return.
         For example to getResponseBody:

         {'base64Encoded': False, 'body': 'response body string'}

     """
     return self.execute("executeCdpCommand", {'cmd': cmd, 'params': cmd_args})['value']

After doing some research and testing I found that this function could be used to achieve your use case.

import base64
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfpage import PDFParser
from pdfminer.pdfpage import PDFDocument

chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument('--headless')

browser = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)
browser.get('http://www.google.com')

# use can defined additional parameters if needed
params = {'landscape': False,
          'paperWidth': 8.27,
          'paperHeight': 11.69}

# call the function "execute_cdp_cmd" with the command "Page.printToPDF" with
# parameters defined above
data = browser.execute_cdp_cmd("Page.printToPDF", params)

# save the output to a file.
with open('file_name.pdf', 'wb') as file:
    file.write(base64.b64decode(data['data']))

browser.quit()

# verify the page size of the PDF file created
parser = PDFParser(open('file_name.pdf', 'rb'))
doc = PDFDocument(parser)
pageSizesList = []
for page in PDFPage.create_pages(doc):
    print(page.mediabox)
    # output 
    [0, 0, 594.95996, 840.95996]

The output is in points, which need to be converted to inches.

594.95996 points equals 8.263332777783 inches
840.95996 points equals 11.6799994445 inches

8.263332777783 x 11.6799994445 is the A4 paper size.

ORIGINAL POST 07-13-2021

There are multiple parameters that you can pass when calling the function Page.printToPDF. Two of those parameters are:

paper_width
paper_height

The following code passes these parameters to Page.printToPDF.

import json
import base64
from selenium import webdriver
from selenium.webdriver.chrome.options import Options


def send_devtools(driver, command, params=None):
    if params is None:
        params = {}
    resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id
    url = driver.command_executor._url + resource
    body = json.dumps({"cmd": command, "params": params})
    resp = driver.command_executor._request("POST", url, body)
    return resp.get("value")


def create_pdf(driver, file_name):
    command = "Page.printToPDF"
    params = {'paper_width': '8.27', 'paper_height': '11.69'}
    result = send_devtools(driver, command,  params)
    save_pdf(result, file_name)
    return


def save_pdf(data, file_name):
    with open(file_name, 'wb') as file:
        file.write(base64.b64decode(data['data']))
    print('PDF created')


chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument('--headless')

browser = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)
browser.get('http://www.google.com')

create_pdf(browser, 'test_pdf_1.pdf')

----------------------------------------
My system information
----------------------------------------
Platform:       maxOS
OS Version:     10.15.7
Python Version: 3.9
Selenium:       3.141.0
pdfminer.sixth: 20201018
----------------------------------------

Nice. I want to dynamically change the print size based on screen size. I am able to collect screen size programmatically, and I can then set the window size to match. Is it possible to convert the screen size from pixels into inches so that the resulting PDF is the same dimensions as the actual webpage? I am using geckodriver btw — MrChadMWood, May 18 '23 at 16:47

Selenium print PDF in A4 format

1 Answers1

UPDATED POST 07-17-2021

ORIGINAL POST 07-13-2021

Linked