8

I been doing research on how to edit PDF using Python and i have found this article:
How to Populate Fillable PDF's with Python

However there is a problem once the program runs and you open the PDF the document is not populated only when you click on the tags it shows the data and when you click away it disappears again. This is code that can be found online that someone else has written.

#! /usr/bin/python

import os
import pdfrw


INVOICE_TEMPLATE_PATH = 'invoice_template.pdf'
INVOICE_OUTPUT_PATH = 'invoice.pdf'


ANNOT_KEY = '/Annots'
ANNOT_FIELD_KEY = '/T'
ANNOT_VAL_KEY = '/V'
ANNOT_RECT_KEY = '/Rect'
SUBTYPE_KEY = '/Subtype'
WIDGET_SUBTYPE_KEY = '/Widget'


def write_fillable_pdf(input_pdf_path, output_pdf_path, data_dict):
    template_pdf = pdfrw.PdfReader(input_pdf_path)
    annotations = template_pdf.pages[0][ANNOT_KEY]
    for annotation in annotations:
        if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
            if annotation[ANNOT_FIELD_KEY]:
                key = annotation[ANNOT_FIELD_KEY][1:-1]
                if key in data_dict.keys():
                    annotation.update(
                        pdfrw.PdfDict(V='{}'.format(data_dict[key]))
                    )
    pdfrw.PdfWriter().write(output_pdf_path, template_pdf)


data_dict = {
   'business_name_1': 'Bostata',
   'customer_name': 'company.io',
   'customer_email': 'joe@company.io',
   'invoice_number': '102394',
   'send_date': '2018-02-13',
   'due_date': '2018-03-13',
   'note_contents': 'Thank you for your business, Joe',
   'item_1': 'Data consulting services',
   'item_1_quantity': '10 hours',
   'item_1_price': '$200/hr',
   'item_1_amount': '$2000',
   'subtotal': '$2000',
   'tax': '0',
   'discounts': '0',
   'total': '$2000',
   'business_name_2': 'Bostata LLC',
   'business_email_address': 'hi@bostata.com',
   'business_phone_number': '(617) 930-4294'
}

if __name__ == '__main__':
    write_fillable_pdf(INVOICE_TEMPLATE_PATH, INVOICE_OUTPUT_PATH, data_dict)
martineau
  • 119,623
  • 25
  • 170
  • 301
John
  • 81
  • 1
  • 3
  • same problem for me, did you find a solution? – gustavz Aug 09 '19 at 07:57
  • the original article link is broken, but I found a [copy in archive.org](http://web.archive.org/web/20190220050925/https://bostata.com/post/how_to_populate_fillable_pdfs_with_python/) – abu Feb 21 '23 at 10:53

6 Answers6

16

I figure out that if you add NeedAppearances param you will solve your problem:

template_pdf = pdfrw.PdfReader(TEMPLATE_PATH)
template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true'))) 
Sergio Sánchez
  • 191
  • 1
  • 6
  • 1
    Not sure why it was down-voted. Your solution fixed the above problem for me @Sergio Sanchez. Thank you! This was also posted by TLK3 here https://github.com/pmaupin/pdfrw/issues/84 – Gopinath S Jan 26 '20 at 08:34
7

Updating the write function to have keys AP and V fixed the problem for me in preview

pdfrw.PdfDict(AP=data_dict[key], V=data_dict[key])
jsamol
  • 3,042
  • 2
  • 16
  • 27
pullyl
  • 126
  • 1
  • 7
3

The error is because no appearance stream is associated with the field, but you've created it in a wrong way. You've just assigned and stream to AP dictionary. What you need to do is to assign an indirect Xobject to /N in /AP dictionary; and you need to crate Xobject from scratch. The code should be something like the following:

from pdfrw import PdfWriter, PdfReader, IndirectPdfDict, PdfName, PdfDict

INVOICE_TEMPLATE_PATH = 'untitled.pdf'
INVOICE_OUTPUT_PATH = 'untitled-output.pdf'

field1value = 'im field_1 value'

template_pdf = PdfReader(INVOICE_TEMPLATE_PATH)
template_pdf.Root.AcroForm.Fields[0].V = field1value

#this depends on page orientation
rct = template_pdf.Root.AcroForm.Fields[0].Rect
hight = round(float(rct[3]) - float(rct[1]),2)
width =(round(float(rct[2]) - float(rct[0]),2)

#create Xobject
xobj = IndirectPdfDict(
            BBox = [0, 0, width, hight],
            FormType = 1,
            Resources = PdfDict(ProcSet = [PdfName.PDF, PdfName.Text]),
            Subtype = PdfName.Form,
            Type = PdfName.XObject
            )

#assign a stream to it
xobj.stream = '''/Tx BMC
BT
 /Helvetica 8.0 Tf
 1.0 5.0 Td
 0 g
 (''' + field1value + ''') Tj
ET EMC'''

#put all together
template_pdf.Root.AcroForm.Fields[0].AP = PdfDict(N = xobj)

#output to new file
PdfWriter().write(INVOICE_OUTPUT_PATH, template_pdf)

Note: FYI: /Type, /FormType, /Resorces are optional (/Resources is strongly recomended).

Strayhorn
  • 687
  • 6
  • 16
1

To expand on Sergio's answer above, the following line:

template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))

Should be put after this line in the example code from OP:

template_pdf = pdfrw.PdfReader(input_pdf_path)
Mike
  • 11
  • 1
  • 1
    Answers should be self contained, not add-on to other answers. Additionally, the information you are trying to add here is pretty much already in the original answer. Since that original answer already has significant approval, meaning its intent was probably clear to begin with, your addition is not only misplaced, but somewhat redundant. – Amitai Irron Jun 05 '20 at 15:35
1

In case someone has dropdown fields on the form you want to populate with data you can use the code below. (Might save someone the hassle I went through)

if key in data_dict.keys():
    #see if its a dropdown
    if('/I' in annotation.keys()):
        #field is a dropdown
        #Check if value is in preset list of dropdown, and at what value
        if data_dict[key] in annotation['/Opt']:
            #Value is in dropdown list,select value from list
            annotation.update(pdfrw.PdfDict(I='[{}]'.format(annotation['/Opt'].index(data_dict[key]))))
        else:
            #Value is not in dropdown list, add as 'free input'
            annotation.update(pdfrw.PdfDict(I='{}'.format(None)))
            annotation.update(pdfrw.PdfDict(V='{}'.format(data_dict[key])))
    else:
        #update the textfieldvalue
        annotation.update(pdfrw.PdfDict(V='{}'.format(data_dict[key])))

also not that the OP code only works for the first page due to

template_pdf.pages[0]
0

I had this problem, where the field values would appear if you opened it in Acrobat, but many of them would be invisible if you opened it in Chrome or Preview, unless you clicked on them. And they would not be accessible programmatically (using pdfplumber).

Converting the file to PDF/A in Acrobat (not Preview) solved the issue for most files.

larapsodia
  • 594
  • 4
  • 15