13

How can I fill a PDF file with forms with data and "flatten" it?

I use pdftk at the moment, but it does not process national characters correctly.

Is there any Python library or example how to fill PDF forms and render it to a non-editable PDF file?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Marek Wajdzik
  • 231
  • 1
  • 2
  • 4
  • I don't understand exactly what you want, but ReportLab is a widely used Python library for PDFs. – Antonis Christofides Jul 19 '13 at 09:08
  • Refer to this answer: http://stackoverflow.com/questions/1890570/how-can-i-auto-populate-a-pdf-form-in-django-python – Aakash Anuj Jul 19 '13 at 09:26
  • @AntonisChristofides, The OP is asking for an easy way to *flatten* (read merge) form fields with database content. My guess is that he would have the content to print in a Python dictionary and a premade PDF sheet. Anyway, ReportLab - as you suggested - is the way to go, to me at least. – Mathieu Marques Jul 19 '13 at 09:52

4 Answers4

17

Straight from the pypdf docs (added years after the question was asked):

from pypdf import PdfReader, PdfWriter

reader = PdfReader("form.pdf")
writer = PdfWriter()

page = reader.pages[0]
fields = reader.get_fields()

writer.add_page(page)

writer.update_page_form_field_values(
    writer.pages[0], {"fieldname": "some filled in text"}
)

# write "output" to PyPDF2-output.pdf
with open("filled-out.pdf", "wb") as output_stream:
    writer.write(output_stream)
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • Related issue: https://stackoverflow.com/questions/47288578/pdf-form-filled-with-pypdf2-does-not-show-in-print – Wtower Jan 19 '23 at 10:58
  • This solution worked great for me. PyPDF2 has been recently updated and seems to be an active project. – Trevor Sullivan Feb 06 '23 at 19:19
  • 3
    I'm the maintainer of PyPDF2 and pypdf. I moved PyPDF2 back into pypdf. Only pypdf will receive new features / bug fixes in future. – Martin Thoma Feb 06 '23 at 22:23
  • @MartinThoma I realised I need to open and save the PDF after filling in, in order to actually save the form, otherwise it remains unsaved, and the changes are not visible when I open it again or copy it. Is this a bug and is there a way around it? – weasel Jun 21 '23 at 10:20
  • Sounds like a bug, but I also don't quite get it. Do you use the latest version of pypdf? If yes, please open a bug in the bug tracker. If not, please upgrade. We had some improvements regarding forms last week – Martin Thoma Jun 21 '23 at 11:00
9

Give the fillpdf library a try, it makes this process very simple (pip install fillpdf and poppler dependency conda install -c conda-forge poppler)

Basic usage:

from fillpdf import fillpdfs

fillpdfs.get_form_fields("blank.pdf")

# returns a dictionary of fields
# Set the returned dictionary values a save to a variable
# For radio boxes ('Off' = not filled, 'Yes' = filled)

data_dict = {
'Text2': 'Name',
'Text4': 'LastName',
'box': 'Yes',
}

fillpdfs.write_fillable_pdf('blank.pdf', 'new.pdf', data_dict)

# If you want it flattened:
fillpdfs.flatten_pdf('new.pdf', 'newflat.pdf')

More info here: https://github.com/t-houssian/fillpdf

Seems to fill very well.

See this answer here for more info: https://stackoverflow.com/a/66809578/13537359

Tyler Houssian
  • 365
  • 4
  • 7
1

You do not need a library per say to flatten the PDF, per the Adobe Docs, you can change the Bit Position of the Editable Form Fields to 1 to make the field ReadOnly. I provided a full solution here, but it uses Django:

https://stackoverflow.com/a/55301804/8382028

Adobe Docs (page 441):

https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf

Use PyPDF2 to fill the fields, then loop through the annotations to change the bit position:

from io import BytesIO
import PyPDF2
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject

# open the pdf
input_stream = open("YourPDF.pdf", "rb")
pdf_reader = PyPDF2.PdfFileReader(input_stream, strict=False)
if "/AcroForm" in pdf_reader.trailer["/Root"]:
    pdf_reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

pdf_writer = PyPDF2.PdfFileWriter()
set_need_appearances_writer(pdf_writer)
if "/AcroForm" in pdf_writer._root_object:
    # Acro form is form field, set needs appearances to fix printing issues
    pdf_writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

data_dict = dict() # this is a dict of your DB form values

pdf_writer.addPage(pdf_reader.getPage(0))
page = pdf_writer.getPage(0)
# update form fields
pdf_writer.updatePageFormFieldValues(page, data_dict)
for j in range(0, len(page['/Annots'])):
    writer_annot = page['/Annots'][j].getObject()
    for field in data_dict:
        if writer_annot.get('/T') == field:
            writer_annot.update({
                NameObject("/Ff"): NumberObject(1)    # make ReadOnly
            })
output_stream = BytesIO()
pdf_writer.write(output_stream)

# output_stream is your flattened PDF


def set_need_appearances_writer(writer):
    # basically used to ensured there are not 
    # overlapping form fields, which makes printing hard
    try:
        catalog = writer._root_object
        # get the AcroForm tree and add "/NeedAppearances attribute
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
      

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
    
    return writer  
ViaTech
  • 2,143
  • 1
  • 16
  • 51
  • PDF standard document has moved. New location: https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf – bcattle Jun 28 '22 at 00:19
  • @bcattle, thanks updated and it appears the page changed so I reference the correct one now – ViaTech Jun 28 '22 at 21:17
-1

We can also consider using API instead of importing packages to work with PDF. This way has it's own advantages/disadvantages, but hey it gives us new perspective to enhance our applications!

One of example is to use PDF.co API to fill PDF forms. You can also consider other alternatives such as Adobe API, DocSpring, pdfFiller, etc. Following code snippet might be useful where it demonstrates filling PDF form using predefined JSON payload.

import os
import requests # pip install requests

# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co/documentation/api
API_KEY = "**************************************"

# Base URL for PDF.co Web API requests
BASE_URL = "https://api.pdf.co/v1"


def main(args = None):
    fillPDFForm()


def fillPDFForm():
    """Fill PDF form using PDF.co Web API"""

    # Prepare requests params as JSON
    # See documentation: https://apidocs.pdf.co
    payload = "{\n    \"async\": false,\n    \"encrypt\": false,\n    \"name\": \"f1040-filled\",\n    \"url\": \"https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-form/f1040.pdf\",\n    \"fields\": [\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].FilingStatus[0].c1_01[1]\",\n            \"pages\": \"1\",\n            \"text\": \"True\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].f1_02[0]\",\n            \"pages\": \"1\",\n            \"text\": \"John A.\"\n        },        \n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].f1_03[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Doe\"\n        },        \n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_04[0]\",\n            \"pages\": \"1\",\n            \"text\": \"123456789\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_05[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Joan B.\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_05[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Joan B.\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_06[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Doe\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_07[0]\",\n            \"pages\": \"1\",\n            \"text\": \"987654321\"\n        }     \n\n\n\n    ],\n    \"annotations\":[\n        {\n            \"text\":\"Sample Filled with PDF.co API using /pdf/edit/add. Get fields from forms using /pdf/info/fields\",\n            \"x\": 10,\n            \"y\": 10,\n            \"size\": 12,\n            \"pages\": \"0-\",\n            \"color\": \"FFCCCC\",\n            \"link\": \"https://pdf.co\"\n        }\n    ],    \n    \"images\": [        \n    ]\n}"

    # Prepare URL for 'Fill PDF' API request
    url = "{}/pdf/edit/add".format(BASE_URL)

    # Execute request and get response as JSON
    response = requests.post(url, data=payload, headers={"x-api-key": API_KEY, 'Content-Type': 'application/json'})
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            #  Get URL of result file
            resultFileUrl = json["url"]
            # Download result file
            r = requests.get(resultFileUrl, stream=True)
            if (r.status_code == 200):
                with open(destinationFile, 'wb') as file:
                    for chunk in r:
                        file.write(chunk)
                print(f"Result file saved as \"{destinationFile}\" file.")
            else:
                print(f"Request error: {response.status_code} {response.reason}")
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")

if __name__ == '__main__':
    main()

Hiren
  • 57
  • 6