59

I have PDF forms that I want to autopopulate with data from my Django web application and then offer to the user to download. What python library would let me easily pre-populate PDF forms? These forms are intended to be printed out.

MikeN
  • 45,039
  • 49
  • 151
  • 227

3 Answers3

80

Reportlab is great if you're generating very dynamic PDFs and need to programmatically control all of it: data and layout.

To just fill out forms in existing PDFs, reportlab is overkill and you'll basically have to rebuild the PDF from scratch in reportlab instead of just taking a PDF with a form that's already been made.

PDF forms work with FDF data. I ported a PHP FDF library to Python a while back when I had to do this and released it as fdfgen. I use that to generate an fdf file with the data for the form, then use pdftk to push the fdf into a PDF form and generate the output.

The whole process works like this:

  1. You (or a designer) design the PDF in Acrobat or whatever and mark the form fields and take note of the field names (I'm not sure exactly how this is done; our designer does this step). Let's say your form has fields "name" and "telephone".
  2. Use fdfgen to create a FDF file:

    from fdfgen import forge_fdf
    fields = [('name','John Smith'),('telephone','555-1234')]
    fdf = forge_fdf("",fields,[],[],[])
    fdf_file = open("data.fdf","w")
    fdf_file.write(fdf)
    fdf_file.close()
    
  3. Then you run pdftk to merge and flatten:

    pdftk form.pdf fill_form data.fdf output output.pdf flatten
    

    and a filled out, flattened (meaning that there are no longer editable form fields) pdf will be in output.pdf.

It's a bit complicated, and pdftk can be a pain to install (requires a java stack and there are bugs on Ubuntu 9.10 that have to be worked around) but it's the simplest process I've been able to come up with yet and the workflow is convenient (ie, our designers can make all the layout changes to the PDF they want and as long as they don't change the names of the fields, I can drop the new one in and everything keeps working).

I apologize for the lack of docs on fdfgen. forge_fdf() is really the only function you should need and it has a docstrings to explain the arguments. I've just never quite gotten around to doing more with it.

yprez
  • 14,854
  • 11
  • 55
  • 70
thraxil
  • 4,971
  • 2
  • 19
  • 10
  • When trying out this sample code I get this error: UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-13: ordinal not in range(128) – MikeN Jul 22 '10 at 15:48
  • 4
    Try writing your file using utf: "fdf_file = codecs.open("data.fdf","w", "utf-8" )". You'll have to "import codecs" first. I also found a problem with special character encoding which I'll report to the author. – srmark Jan 02 '11 at 06:16
  • +1 and more if I could. Thank you for fdfgen. I managed to get this working to batch fill a ton of forms from a CSV file. – Evan Plaice Jan 10 '13 at 02:13
  • This is great, thank you so much for your contribution, thraxil. A word of warning for others, I couldn't get pdftk to run from Apache2 WSGI website, it would block, because Apache2 would return a blocked signal to the subprocess command. I had to set up Celery to make the call external to Apache2. Then from my calling view method, I had to get the task.delay().get() to wait for the response - synchronous. For more details see: http://stackoverflow.com/questions/7543452/how-to-launch-a-pdftk-subprocess-while-in-wsgi. Thanks again! – Furbeenator Jul 16 '13 at 17:06
  • Hello @thraxil , This contribution is very helpful. I just wanted to know that how do we select particular check box from a group of check-boxes ? For example, in the image : http://i.imgur.com/1ar7jyQ.png , I have field named type_of_ownership which has six check boxes and I want to select 'Limited Liability'. How do I pass arguments to the forge_fdf function (proabaly in fdf_data_names )? Kindly revert back with the format as I am stuck here. Thanks you in anticipation ! – Dharmraj Feb 03 '16 at 14:01
  • Quick and easy example to follow. Thanks. – Jay Modi Aug 03 '16 at 09:52
  • I ran into an issue with this method. The idea I had was to auto-populate some fields with data, while others were to be filled in by the user after auto-population. This, however, caused the result to be a flattened file, even without the `flatten` argument. I believe it has something to do with the digital Adobe signature and _pdftk_ editing the document without one. Just as an FYI to anyone looking to do what I did with it. – Casey May 11 '17 at 15:07
  • This blog has a lot of great info, too. It points at PyPDFtk and PdfJinja near the end. https://yoongkang.com/blog/pdf-forms-with-python/ – phyatt Aug 23 '21 at 16:55
6

Try reportlab.

Also, take a gander at Outputting PDFs.


Edit

I had another thought (but it won't help if you are already have the PDF files, and I like @thraxil's answer better).

Earlier this year I worked on a project where I generated "certificates of completion" for continuing education courses. One of the angles I looked at was trying to generate a PDF directly from an appropriately styled web page (something like a server-side "Print to PDF").

One of the tools I found was wkhtmltopdf. It's a self-contained WebKit browser that will turn a URL into a PDF, and with pretty good results.

The idea is that you use django's template engine to put together a page containing whatever you want (including images), pass it's url to wkhtmltopdf, grab the output and return it to the user.

I liked the approach because it's really simple to implement (just open a pipe), you don't have to worry about keeping the source PDF files accessible to the server, and you can redesign the PDFs by changing the HTML.

Seth
  • 45,033
  • 10
  • 85
  • 120
  • 1
    +1 for mentioning great utility - `wkhtmltopdf`. It enables programmer to design document in html/css, and convert to __very__ good-quality PDF. – gorsky Feb 19 '10 at 01:39
  • i tried using wkhtmltopdf but its so difficult to get it up and running. Moreover, the main issue i faced is with the fonts. I cannot get it to use the same fonts i had in the html design. It uses the basic fonts to generate pdf. – Harsh Math Jul 05 '21 at 07:23
  • Well, wkhtmltopdf wasn't terrible to use back in 2009, but these days it's probably not a good idea. It uses an old version of webkit and probably also has a lot of old dependencies, not to mention security issues. You should use reportlab: https://docs.djangoproject.com/en/3.2/howto/outputting-pdf/ – Seth Jul 07 '21 at 18:46
5

Also look at this code segment which is a ready made solution for creating a pdf view in django which builds on Thraxil's solution above. Thanks to github user zyegfryed.

https://gist.github.com/918403

joshua
  • 2,509
  • 17
  • 19