How to fill PDF form in Python?

Question

How can I fill a PDF file with forms with data and "flatten" it?

I use pdftk at the moment, but it does not process national characters correctly.

Is there any Python library or example how to fill PDF forms and render it to a non-editable PDF file?

I don't understand exactly what you want, but ReportLab is a widely used Python library for PDFs. — Antonis Christofides, Jul 19 '13 at 09:08
Refer to this answer: http://stackoverflow.com/questions/1890570/how-can-i-auto-populate-a-pdf-form-in-django-python — Aakash Anuj, Jul 19 '13 at 09:26
@AntonisChristofides, The OP is asking for an easy way to *flatten* (read merge) form fields with database content. My guess is that he would have the content to print in a Python dictionary and a premade PDF sheet. Anyway, ReportLab - as you suggested - is the way to go, to me at least. — Mathieu Marques, Jul 19 '13 at 09:52

Martin Thoma · Answer 1 · 2023-02-06T22:23:35.370

17

Straight from the pypdf docs (added years after the question was asked):

from pypdf import PdfReader, PdfWriter

reader = PdfReader("form.pdf")
writer = PdfWriter()

page = reader.pages[0]
fields = reader.get_fields()

writer.add_page(page)

writer.update_page_form_field_values(
    writer.pages[0], {"fieldname": "some filled in text"}
)

# write "output" to PyPDF2-output.pdf
with open("filled-out.pdf", "wb") as output_stream:
    writer.write(output_stream)

edited Feb 06 '23 at 22:23

answered May 08 '22 at 11:46

Martin Thoma

124,992
159
614
958

Related issue: https://stackoverflow.com/questions/47288578/pdf-form-filled-with-pypdf2-does-not-show-in-print – Wtower Jan 19 '23 at 10:58
This solution worked great for me. PyPDF2 has been recently updated and seems to be an active project. – Trevor Sullivan Feb 06 '23 at 19:19
3

I'm the maintainer of PyPDF2 and pypdf. I moved PyPDF2 back into pypdf. Only pypdf will receive new features / bug fixes in future. – Martin Thoma Feb 06 '23 at 22:23
@MartinThoma I realised I need to open and save the PDF after filling in, in order to actually save the form, otherwise it remains unsaved, and the changes are not visible when I open it again or copy it. Is this a bug and is there a way around it? – weasel Jun 21 '23 at 10:20
Sounds like a bug, but I also don't quite get it. Do you use the latest version of pypdf? If yes, please open a bug in the bug tracker. If not, please upgrade. We had some improvements regarding forms last week – Martin Thoma Jun 21 '23 at 11:00

score 9 · Answer 2 · answered Mar 26 '21 at 20:36

9

Give the fillpdf library a try, it makes this process very simple (pip install fillpdf and poppler dependency conda install -c conda-forge poppler)

Basic usage:

from fillpdf import fillpdfs

fillpdfs.get_form_fields("blank.pdf")

# returns a dictionary of fields
# Set the returned dictionary values a save to a variable
# For radio boxes ('Off' = not filled, 'Yes' = filled)

data_dict = {
'Text2': 'Name',
'Text4': 'LastName',
'box': 'Yes',
}

fillpdfs.write_fillable_pdf('blank.pdf', 'new.pdf', data_dict)

# If you want it flattened:
fillpdfs.flatten_pdf('new.pdf', 'newflat.pdf')

More info here: https://github.com/t-houssian/fillpdf

Seems to fill very well.

See this answer here for more info: https://stackoverflow.com/a/66809578/13537359

answered Mar 26 '21 at 20:36

Tyler Houssian

365
4
7

1

dictionary returns nothing. I could not get the fields of pdf with this code. – Celik May 19 '22 at 09:57
Uses [pdfrw2](https://pypi.org/project/pdfrw2/) and [pymupdf](https://pypi.org/project/PyMuPDF/). I wonder if either of those could do it directly – Martin Thoma Jun 28 '22 at 07:52
2

This library worked perfectly for me! – Rycliff Jan 19 '23 at 02:25
1

This one worked for me as well, the solution with PyPDF2 didn't work – Timur Mingulov Jan 29 '23 at 04:35

ViaTech · Answer 3 · 2022-06-28T21:17:10.527

You do not need a library per say to flatten the PDF, per the Adobe Docs, you can change the Bit Position of the Editable Form Fields to 1 to make the field ReadOnly. I provided a full solution here, but it uses Django:

https://stackoverflow.com/a/55301804/8382028

Adobe Docs (page 441):

https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf

Use PyPDF2 to fill the fields, then loop through the annotations to change the bit position:

from io import BytesIO
import PyPDF2
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject

# open the pdf
input_stream = open("YourPDF.pdf", "rb")
pdf_reader = PyPDF2.PdfFileReader(input_stream, strict=False)
if "/AcroForm" in pdf_reader.trailer["/Root"]:
    pdf_reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

pdf_writer = PyPDF2.PdfFileWriter()
set_need_appearances_writer(pdf_writer)
if "/AcroForm" in pdf_writer._root_object:
    # Acro form is form field, set needs appearances to fix printing issues
    pdf_writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

data_dict = dict() # this is a dict of your DB form values

pdf_writer.addPage(pdf_reader.getPage(0))
page = pdf_writer.getPage(0)
# update form fields
pdf_writer.updatePageFormFieldValues(page, data_dict)
for j in range(0, len(page['/Annots'])):
    writer_annot = page['/Annots'][j].getObject()
    for field in data_dict:
        if writer_annot.get('/T') == field:
            writer_annot.update({
                NameObject("/Ff"): NumberObject(1)    # make ReadOnly
            })
output_stream = BytesIO()
pdf_writer.write(output_stream)

# output_stream is your flattened PDF


def set_need_appearances_writer(writer):
    # basically used to ensured there are not 
    # overlapping form fields, which makes printing hard
    try:
        catalog = writer._root_object
        # get the AcroForm tree and add "/NeedAppearances attribute
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
      

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
    
    return writer

PDF standard document has moved. New location: https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf — bcattle, Jun 28 '22 at 00:19
@bcattle, thanks updated and it appears the page changed so I reference the correct one now — ViaTech, Jun 28 '22 at 21:17

score -1 · Answer 4 · answered Aug 09 '21 at 04:34

We can also consider using API instead of importing packages to work with PDF. This way has it's own advantages/disadvantages, but hey it gives us new perspective to enhance our applications!

One of example is to use PDF.co API to fill PDF forms. You can also consider other alternatives such as Adobe API, DocSpring, pdfFiller, etc. Following code snippet might be useful where it demonstrates filling PDF form using predefined JSON payload.

import os
import requests # pip install requests

# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co/documentation/api
API_KEY = "**************************************"

# Base URL for PDF.co Web API requests
BASE_URL = "https://api.pdf.co/v1"


def main(args = None):
    fillPDFForm()


def fillPDFForm():
    """Fill PDF form using PDF.co Web API"""

    # Prepare requests params as JSON
    # See documentation: https://apidocs.pdf.co
    payload = "{\n    \"async\": false,\n    \"encrypt\": false,\n    \"name\": \"f1040-filled\",\n    \"url\": \"https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-form/f1040.pdf\",\n    \"fields\": [\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].FilingStatus[0].c1_01[1]\",\n            \"pages\": \"1\",\n            \"text\": \"True\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].f1_02[0]\",\n            \"pages\": \"1\",\n            \"text\": \"John A.\"\n        },        \n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].f1_03[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Doe\"\n        },        \n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_04[0]\",\n            \"pages\": \"1\",\n            \"text\": \"123456789\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_05[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Joan B.\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_05[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Joan B.\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_06[0]\",\n            \"pages\": \"1\",\n            \"text\": \"Doe\"\n        },\n        {\n            \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_07[0]\",\n            \"pages\": \"1\",\n            \"text\": \"987654321\"\n        }     \n\n\n\n    ],\n    \"annotations\":[\n        {\n            \"text\":\"Sample Filled with PDF.co API using /pdf/edit/add. Get fields from forms using /pdf/info/fields\",\n            \"x\": 10,\n            \"y\": 10,\n            \"size\": 12,\n            \"pages\": \"0-\",\n            \"color\": \"FFCCCC\",\n            \"link\": \"https://pdf.co\"\n        }\n    ],    \n    \"images\": [        \n    ]\n}"

    # Prepare URL for 'Fill PDF' API request
    url = "{}/pdf/edit/add".format(BASE_URL)

    # Execute request and get response as JSON
    response = requests.post(url, data=payload, headers={"x-api-key": API_KEY, 'Content-Type': 'application/json'})
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            #  Get URL of result file
            resultFileUrl = json["url"]
            # Download result file
            r = requests.get(resultFileUrl, stream=True)
            if (r.status_code == 200):
                with open(destinationFile, 'wb') as file:
                    for chunk in r:
                        file.write(chunk)
                print(f"Result file saved as \"{destinationFile}\" file.")
            else:
                print(f"Request error: {response.status_code} {response.reason}")
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")

if __name__ == '__main__':
    main()

How to fill PDF form in Python?

4 Answers4