14

I am trying to automatically save a PDF file created with pdftohtmlEX (https://github.com/coolwanglu/pdf2htmlEX) using the selenium (chrome) webdriver.

It almost works except captions of figures and sometimes even part of the figures are missing.

Manually saved:

Manually saved

Automatically saved using selenium & chrome webdriver: Saved using selenium & chromedriver

Here is my code (you need the chromium webdriver (http://chromedriver.chromium.org/downloads) in the same folder as this script):

import json
from selenium import webdriver

# print settings: save as pdf, 'letter' formatting
appState = """{
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "mediaSize": {
        "height_microns": 279400,
        "name": "NA_LETTER",
        "width_microns": 215900,
        "custom_display_name": "Letter"
    },
    "selectedDestinationId": "Save as PDF",
    "version": 2
}"""

appState = json.loads(appState)
profile = {"printing.print_preview_sticky_settings.appState": json.dumps(appState)}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
# Enable automatically pressing the print button in print preview
# https://peter.sh/experiments/chromium-command-line-switches/
chrome_options.add_argument('--kiosk-printing')

driver = webdriver.Chrome('./chromedriver', options=chrome_options)
driver.get('http://www.deeplearningbook.org/contents/intro.html')
driver.execute_script('window.print();')
driver.quit()

Sometimes when I manually print this happens, too. But if I then change any of the printing options, the preview reloads and the image captions are there again and stay there no matter what options I further enable/disable.

Chrome printing settings

What I tried so far:

Max S.
  • 3,704
  • 2
  • 13
  • 34
  • 1
    if you are seeing this problem when you "manually print" as well it means you have a bug. The real problem is probably that Selenium is very quick and it's replicating the thing that happens occasionally when you do it manually – Ardesco Mar 01 '19 at 17:07
  • That is a good point! If I disable 'kiosk-printing' so that everything but the final klick on 'Save PDF' is automated, no matter how long I wait or what print settings I change, the missing elements never appear... which is in contrast to when I manually print. So there seems to be a difference between the Chrome.app and the chrome driver. It's either a missing preference or a bug... – Max S. Mar 02 '19 at 14:55
  • 1
    might be a silly question, but have you tried adding a manual wait time after the get? Selenium does not nessecarily wait for all elements to render properly, and `print` should only print what is rendered at the moment of call. All the PNG files appears to be loaded asynchronously, and Selenium may not be waiting for them to be downloaded and rendered. – Mike Mar 04 '19 at 08:59
  • Yes, forgot to mention, also tried that already :) The thing is, on the website (pdf) all elements are immediately there, it is only once I go to the printing preview that they are missing... – Max S. Mar 04 '19 at 09:53
  • 1
    Problem with your print CSS settings? – Ardesco Mar 05 '19 at 10:27
  • Maybe... Do you mean something else than `isCssBackgroundEnabled`? Enabling that does not make the elements appear, unfortunately. – Max S. Mar 05 '19 at 10:31
  • I think the CSS setting might be it. I found this https://stackoverflow.com/questions/34984081/chrome-window-print-missing-page-elements. Any idea how I load my own CSS settings with selenium? – Max S. Mar 05 '19 at 11:01
  • Add link for later: https://stackoverflow.com/questions/37318538/apply-css-to-printwindow-print-in-javascript – Max S. Mar 06 '19 at 12:48
  • I am not sure how to fix your issue but wanted to say that I appreciate your work, thank you. – Kristiyan D Kovachev Mar 07 '19 at 22:03

1 Answers1

4

So, through fiddeling around, I came by the solution by accident. I don't really understand why, but enabling the 'PrintBrowser mode' ("Enables PrintBrowser mode, in which everything renders as though printed.") solves the issue. This may or may have to do with CSS loading properly.

I just need to add chrome_options.add_argument('--enable-print-browser') and all elements are there!

Max S.
  • 3,704
  • 2
  • 13
  • 34