1

I have been exploring ways to use python to log into a secure website (eg. Salesforce), navigate to a certain page and print (save) the page as pdf at a prescribed location.

I have tried using:

  1. pdfkit.from_url: Use Request to get a session cookie, parse it then pass it as cookie into the wkhtmltopdf's options settings. This method does not work due to pdfkit not being able to recognise the cookie I passed.

  2. pdfkit.from_file: Use Request.get to get the html of the page I want to print, then use pdfkit to convert the html file to pdf. This works but the page format and images are all missing.

  3. Selenium: Use a webdriver to log in then navigate to the wanted page, call the windows.print function. This does not work because I can't pass any arguments to the window's SaveAs dialog.

Does anyone have any idea to get around?

Jonathan Mak
  • 69
  • 2
  • 8
  • One option could be to [save a screenshot using selenium](http://stackoverflow.com/questions/33692179/export-as-pdf-using-selenium-webdriver-screenshot) (and convert to PDF). – Arya Nov 22 '16 at 01:14
  • The problem is full page screenshot is not available and there is no text rendering. – Jonathan Mak Nov 22 '16 at 04:07

1 Answers1

0
  • log in using requests
  • use requests session mechanism to keep track of the cookie
  • use session to retrieve the HTML page
  • parse the HTML (use beautifulsoup)
  • identify img tags and css links
  • download locally the images and css documents
  • rewrite the img src attributes to point to the locally downloaded images
  • rewrite the css links to point to the locally downloaded css
  • serialize the new HTML tree to a local .html file
  • use whatever "HTML to PDF" solution to render the local .html file
Stephane Martin
  • 1,612
  • 1
  • 17
  • 25