0

I am writing a python CGI script that parses a csv file line by line and creates a HTML table using jinja2. Ideally, each individual HTML instance (one for each line) will get saved as a PDF report.

I've done a lot of searching, and I haven't been able to find a good way to go from HTML to PDF using python. Is it possible to render CGI to PDF? What methods would you recommend to accomplish this task?

led53
  • 3
  • 2

1 Answers1

0

If you've done a lot of searching then I am sure you have seen all of the really complicated options for converting html to pdf: ReportLab, weasyprint, pdfkit, etc: as catalogued in this post: How to convert webpage into pdf using Python.

However, I literally just spent the last month working on the Jinja2 -> HTML -> PDF workflow, so hopefully something of my solution can help you. I found downloading wkhtmltopdf (a small command line program written in C - it's pretty great) and using it from subprocess be by far the easiest way to accomplish this task. My sample below:

import os
import jinja2
import subprocess

def render(template_path, context):
    # Context = jinja context dictionary 
    path, filename = os.path.split(template_path)
    return jinja2.Environment(loader=jinja2.FileSystemLoader(path or './')
                             ).get_template(filename
                                           ).render(context)

def create_pdf(custom_data, template_path, out_dir, context, keep_html=False):
    # Custom data = the part of my workflow that won't matter to you
    report_name = os.path.join(out_dir, custom_data + "_report")
    html_name = report_name + ".html"
    pdf_name = report_name + ".pdf"

    def write_html():
        with open(html_name, 'w') as f:
            html = render(template_path, context)
            f.write(html)
    write_html()

    args = '"C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe" --zoom 0.75 -B 0 -L 0 -R 0 -T 0 {} {}'.format(html_name, pdf_name)
    child = subprocess.Popen(args, shell=True, stdout=subprocess.PIPE)
    # Wait for the sub-process to terminate (communicate) then find out the
    # status code
    output, errors = child.communicate()
    if child.returncode or errors:
        # Change this to a log message
        print "ERROR: PDF conversion failed. Subprocess exit status {}".format(
        child.returncode)
    if not keep_html:
        # Try block only to avoid raising an error if you've already moved
        # or deleted the html
        try:
            os.remove(html_name)
        except OSError:
            pass

Most of this code is well-explained by the documentation for jinja2 and subprocess so I won't laboriously go over it here. The arg flags in wkhtmltopdf translate to Zoom = 75% and 0 width margins (the -B, -L, etc are short hand for bottom, left, etc). The wkhtmltopdf documention is also quite good.

Edited: Credit for the render function goes to Matthias Eisen. Usage:

Assuming a template at /some/path/my_tpl.html, containing:

Hello {{ firstname }} {{ lastname }}!

context = {
    'firstname': 'John',
    'lastname': 'Doe'
}
result = render('/some/path/my_tpl.html', context)

print(result)

Hello John Doe!

Community
  • 1
  • 1
HFBrowning
  • 2,196
  • 3
  • 23
  • 42
  • Thanks @HFBrowning. I have one question. How do you pass the context? I can't find this rendering technique in the jinja2 documentation. I'm used to rendering like:\n print("Content-Type: text/html\n\n") print(template.render(ID = ACCESSION_ID, DATE = date.today()) so I'm not sure how you rendered your "context" variable if you are rendering multiple things. If you could point me to the documentation for html = render(template_path, context), that would be great! – led53 May 03 '17 at 20:19
  • Edited to show the origin and use of that function - good question. There are lots of code snippets out there demonstrating wildly different strategies for using `jinja2` – HFBrowning May 03 '17 at 20:32
  • And if it's not clear (not sure how familiar you are with Python), context is a dictionary. My workflow includes saving many (~50) elements to that dictionary, e.g., `context["DATE"] = datetime.datetime.now()`, etc – HFBrowning May 03 '17 at 20:38