Log into secured website, automatically print page as pdf

Question

I have been exploring ways to use python to log into a secure website (eg. Salesforce), navigate to a certain page and print (save) the page as pdf at a prescribed location.

I have tried using:

pdfkit.from_url: Use Request to get a session cookie, parse it then pass it as cookie into the wkhtmltopdf's options settings. This method does not work due to pdfkit not being able to recognise the cookie I passed.
pdfkit.from_file: Use Request.get to get the html of the page I want to print, then use pdfkit to convert the html file to pdf. This works but the page format and images are all missing.
Selenium: Use a webdriver to log in then navigate to the wanted page, call the windows.print function. This does not work because I can't pass any arguments to the window's SaveAs dialog.

Does anyone have any idea to get around?

One option could be to [save a screenshot using selenium](http://stackoverflow.com/questions/33692179/export-as-pdf-using-selenium-webdriver-screenshot) (and convert to PDF). — Arya, Nov 22 '16 at 01:14
The problem is full page screenshot is not available and there is no text rendering. — Jonathan Mak, Nov 22 '16 at 04:07

score 0 · Answer 1 · answered Nov 22 '16 at 01:12

log in using requests
use requests session mechanism to keep track of the cookie
use session to retrieve the HTML page
parse the HTML (use beautifulsoup)
identify img tags and css links
download locally the images and css documents
rewrite the img src attributes to point to the locally downloaded images
rewrite the css links to point to the locally downloaded css
serialize the new HTML tree to a local .html file
use whatever "HTML to PDF" solution to render the local .html file

Log into secured website, automatically print page as pdf

1 Answers1