4

I have a PDF form created using Adobe LiveCycle Designer ES 10.4. I need to fill it using Python so that we can reduce manual labor. I searched the web and read some article most of them were focused around pdfrw library, I tried using it and extracted some information from PDF form as shown below

Code

from pdfrw import PdfReader
pdf = PdfReader('sample.pdf')
print(pdf.keys())
print(pdf.Info)
print(pdf.Root.keys())
print('PDF has {} pages'.format(len(pdf.pages)))

Output

['/Root', '/Info', '/ID', '/Size']
{'/CreationDate': "(D:20180822164509+05'30')", '/Creator': '(Adobe LiveCycle Designer ES 10.4)', '/ModDate': "(D:20180822165611+05'30')", '/Producer': '(Adobe XML Form Module Library)'}
['/AcroForm', '/MarkInfo', '/Metadata', '/Names', '/NeedsRendering', '/Pages', '/Perms', '/StructTreeRoot', '/Type']
PDF has 1 pages

I am not sure how further I can use pdfrw to access the fillable fields from the PDF form and fill them using Python is it possible. Any suggestions would be helpful.

Atinesh
  • 1,790
  • 9
  • 36
  • 57
  • 1
    Forms created in Adobe LiveCycle Designer ES 10.4 come in two flavors and your task will be different based on which one you have. Designer can create either a static XFA form, which contains a normal PDF form with fields as well as an XML component for all of the logic and scripting. Alternatively, you might have a dynamic XFA form which doesn't contain a PDF form at all but relies on Adobe Reader (and a few other viewers) to render the XML into a form on the fly. My answer will depend on which type of form you have. – joelgeraci Sep 21 '18 at 19:14
  • @joelgeraci I just have a `PDF form` I don't know how it is created. I can open it in `Acrobat Reader`. – arush1836 Sep 22 '18 at 03:43
  • Can you share the file, I can identify the form type. – joelgeraci Sep 22 '18 at 16:42
  • Possible duplicate of [How can I auto-populate a PDF form in Django/Python?](https://stackoverflow.com/questions/1890570/how-can-i-auto-populate-a-pdf-form-in-django-python) – Gabriel Devillers Nov 14 '18 at 22:38
  • @joelgeraci In my PDF it is showing Producer as 'Adobe XML Form Module Library'. Can you please help me out with this question please. https://stackoverflow.com/questions/62760343/can-this-fillable-pdf-be-automated –  Jul 07 '20 at 11:57

5 Answers5

6

You can find the form fields here:

pdf.Root.AcroForm.Fields

or here

pdf.Root.Pages.Kids[page_index].Annots

This is a PdfArray object. Basically a List. The Name of the field is found here:

pdf.Root.AcroForm.Fields[field_index].T

Other keys include the value .V There's a bunch of display information, like the font etc under .AP.N.Resources

However, if you update the value for a field and output the pdf file. It might only display the value when the field has focus i.e is clicked on.

I haven't figured out how to fix that yet.

Eddie
  • 140
  • 1
  • 7
4

I wrote a library built upon:'pdfrw', 'pdf2image', 'Pillow', 'PyPDF2' called fillpdf (pip install fillpdf and poppler dependency conda install -c conda-forge poppler)

Basic usage:

from fillpdf import fillpdfs

fillpdfs.get_form_fields("blank.pdf")

# returns a dictionary of fields
# Set the returned dictionary values a save to a variable
# For radio boxes ('Off' = not filled, 'Yes' = filled)

data_dict = {
'Text2': 'Name',
'Text4': 'LastName',
'box': 'Yes',
}

fillpdfs.write_fillable_pdf('blank.pdf', 'new.pdf', data_dict)

# If you want it flattened:
fillpdfs.flatten_pdf('new.pdf', 'newflat.pdf')

More info here: https://github.com/t-houssian/fillpdf

If some fields don't fill, you can use fitz (pip install PyMuPDF) and PyPDF2 (pip install PyPDF2) like the following altering the points as needed:

import fitz
from PyPDF2 import PdfFileReader

file_handle = fitz.open('blank.pdf')
pdf = PdfFileReader(open('blank.pdf','rb'))
box = pdf.getPage(0).mediaBox
w = box.getWidth()
h = box.getHeight()

# For images
image_rectangle = fitz.Rect((w/2)-200,h-255,(w/2)-100,h-118)
pages = pdf.getNumPages() - 1
last_page = file_handle[pages]
last_page._wrapContents()
last_page.insertImage(image_rectangle, filename=f'image.png')

# For text
last_page.insertText(fitz.Point((w/2)-247 , h-478), 'John Smith', fontsize=14, fontname="times-bold")
file_handle.save(f'newpdf.pdf')
Tyler Houssian
  • 365
  • 4
  • 7
  • I'm trying to conda install poppler like you mentioned, and it just keeps hanging. Did you have any trick for it? I'm trying directly in the anaconda prompt. – misterducky Apr 07 '21 at 16:41
  • @misterducky Interesting. There are a few different install commands on this page to try: [https://anaconda.org/conda-forge/poppler](https://anaconda.org/conda-forge/poppler). Look here as well: [https://stackoverflow.com/questions/57330485/unable-to-install-poppler-on-windows-using-conda](https://stackoverflow.com/questions/57330485/unable-to-install-poppler-on-windows-using-conda) – Tyler Houssian Apr 12 '21 at 18:08
1

Use this to fill every fields if they are indexed.

template = PdfReader('template.pdf')
page_c = 0
while page_c < len(template.Root.Pages.Kids): #LOOP through pages
    annot_c = 0
    while annot_c < len(template.Root.Pages.Kids[page_c].Annots): #LOOP through fields
        template.Root.Pages.Kids[page_c].Annots[annot_c].update(PdfDict(V=str(annot_c)+'-'+str(page_c)))
        annot_c=annot_c+1
    page_c=page_c+1
PdfWriter().write('output.pdf', template)
Asif Alam
  • 59
  • 2
0

AcroForm based Forms using PDFix SDK

def SetFormFieldValue(email, key, open_path, save_path):
    pdfix  = GetPdfix()
    if pdfix is None:
        raise Exception('Pdfix Initialization fail')
    if not pdfix.Authorize(pdfix_email, pdfix_license):
        raise Exception('Authorization fail : ' + pdfix.GetError())
    doc = pdfix.OpenDoc(open_path, "")
    if doc is None:
        raise Exception('Unable to open pdf : ' + pdfix.GetError())
    field = doc.GetFormFieldByName("Text1")
    if field is not None:
        value = field.GetValue()
        value = "New Value"
        field.SetValue(value)
    if not doc.Save(save_path, kSaveFull):
        raise Exception(pdfix.GetError())
    doc.Close()
    pdfix.Destroy()
paolo
  • 11
  • 1
0

A full solution was provided here: How to edit editable pdf using the pdfrw library?

The key part is the:

template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true'))) 
Asensio
  • 1
  • 2