Fill out a PDF form in Python that is encrypted

Question

My project is to automatically fill the PDF form of the German railway company (Deutsche Bahn) for delayed trains. https://www.bahn.de/wmedia/view/mdb/media/intern/fahrgastrechteformular.pdf

When you open the link with Google Chrome you can easily edit the document. So it should also be possible to do it in python.

I tried multiple things:

1. Using PyPDF2

and the approach that is suggested in the second answer in this stack overflow question: Batch fill PDF forms from python or bash

# -*- coding: utf-8 -*-

from collections import OrderedDict
from PyPDF2 import PdfFileWriter, PdfFileReader


def _getFields(obj, tree=None, retval=None, fileobj=None):
    """
    Extracts field data if this PDF contains interactive form fields.
    The *tree* and *retval* parameters are for recursive use.

    :param fileobj: A file object (usually a text file) to write
        a report to on all interactive form fields found.
    :return: A dictionary where each key is a field name, and each
        value is a :class:`Field<PyPDF2.generic.Field>` object. By
        default, the mapping name is used for keys.
    :rtype: dict, or ``None`` if form data could not be located.
    """
    fieldAttributes = {'/FT': 'Field Type', '/Parent': 'Parent', '/T': 'Field Name', '/TU': 'Alternate Field Name',
                       '/TM': 'Mapping Name', '/Ff': 'Field Flags', '/V': 'Value', '/DV': 'Default Value'}
    if retval is None:
        retval = OrderedDict()
        catalog = obj.trailer["/Root"]
        # get the AcroForm tree
        if "/AcroForm" in catalog:
            tree = catalog["/AcroForm"]
        else:
            return None
    if tree is None:
        return retval

    obj._checkKids(tree, retval, fileobj)
    for attr in fieldAttributes:
        if attr in tree:
            # Tree is a field
            obj._buildField(tree, retval, fileobj, fieldAttributes)
            break

    if "/Fields" in tree:
        fields = tree["/Fields"]
        for f in fields:
            field = f.getObject()
            obj._buildField(field, retval, fileobj, fieldAttributes)

    return retval


def get_form_fields(infile):
    infile = PdfFileReader(open(infile, 'rb'))
    fields = _getFields(infile)
    return OrderedDict((k, v.get('/V', '')) for k, v in fields.items())


if __name__ == '__main__':
    from pprint import pprint

    pdf_file_name = '2PagesFormExample.pdf'

    pprint(get_form_fields(pdf_file_name))

However the program has the problem to decrypt the PDF:

  File "c:\Users\User1\iCloudDrive\fahrgastrechte\fahrgastrechte.py", line 94, in <module>
    pprint(get_form_fields(pdf_file_name))
  File "c:\Users\User1\iCloudDrive\fahrgastrechte\fahrgastrechte.py", line 62, in get_form_fields
    fields = _getFields(infile)
  File "c:\Users\User1\iCloudDrive\fahrgastrechte\fahrgastrechte.py", line 32, in _getFields
    catalog = obj.trailer["/Root"]
  File "C:\Program Files\Python36\lib\site-packages\PyPDF2\generic.py", line 516, in __getitem__
    return dict.__getitem__(self, key).getObject()
  File "C:\Program Files\Python36\lib\site-packages\PyPDF2\generic.py", line 178, in getObject
    return self.pdf.getObject(self).getObject()
  File "C:\Program Files\Python36\lib\site-packages\PyPDF2\pdf.py", line 1617, in getObject
    raise utils.PdfReadError("file has not been decrypted")
PyPDF2.utils.PdfReadError: file has not been decrypted

I don't know why decryption is necessary at all, because I only want to read data in the first place. I could understand when it is about writing data. However it is also possible to write in the fields of the PDF when using for example Google Chrome.

2. Using pypdftk.

In the beginning I just wanted to read the data of the form:

import pypdftk

pdf_file_name = './fahrgastrechteformular.pdf'
data = pypdftk.dump_data_fields(pdf_file_name)

Currently my system (Windows 10) is not recognizing the pdftk.exe which the pyhton module is calling. So I directly called it in bash:

pdftk.exe fahrgastrechteformular.pdf dum_data_fields

I also got an encryption error back:

Error: Failed to open PDF file:
   fahrgastrechteformular.pdf
   OWNER PASSWORD REQUIRED, but not given (or incorrect)
Error: Unable to find file.
Error: Failed to open PDF file:
   dum_data_fields
Done.  Input errors, so no output created.

So in the beginning I just want to read the form fields of the PDF. For example, when I filled in the first field "Berlin Central Station" with Google Chrome for example. I want to read it out by the python scripts mentioned above. Next step would be, to actually edit the fields content. Hope you can follow. Please ask question when something is unclear.

Fill out a PDF form in Python that is encrypted

1. Using PyPDF2

2. Using pypdftk.

0 Answers0