I am trying to automatically save a PDF file created with pdftohtmlEX
(https://github.com/coolwanglu/pdf2htmlEX) using the selenium (chrome) webdriver.
It almost works except captions of figures and sometimes even part of the figures are missing.
Manually saved:
Automatically saved using selenium & chrome webdriver:
Here is my code (you need the chromium webdriver (http://chromedriver.chromium.org/downloads) in the same folder as this script):
import json
from selenium import webdriver
# print settings: save as pdf, 'letter' formatting
appState = """{
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local"
}
],
"mediaSize": {
"height_microns": 279400,
"name": "NA_LETTER",
"width_microns": 215900,
"custom_display_name": "Letter"
},
"selectedDestinationId": "Save as PDF",
"version": 2
}"""
appState = json.loads(appState)
profile = {"printing.print_preview_sticky_settings.appState": json.dumps(appState)}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', profile)
# Enable automatically pressing the print button in print preview
# https://peter.sh/experiments/chromium-command-line-switches/
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome('./chromedriver', options=chrome_options)
driver.get('http://www.deeplearningbook.org/contents/intro.html')
driver.execute_script('window.print();')
driver.quit()
Sometimes when I manually print this happens, too. But if I then change any of the printing options, the preview reloads and the image captions are there again and stay there no matter what options I further enable/disable.
What I tried so far:
- different Chrome webdriver versions (71, 72, 73) from this site: http://chromedriver.chromium.org/downloads
- enable background graphics by adding '"isCssBackgroundEnabled": true' to the appState