0

I am starting a project that will take specific pages out of each PDF in a folder and merge those pages into a single file. I am getting the error below when building the quoted code about the length of the encryption, and I don't know where I would need to address that.

from PyPDF2 import PdfFileMerger
import glob

files = glob.glob('C:/Users/Jake/Documents/UPLOAD/test_merge/*.pdf')

merger = PdfFileMerger()

for file in files:
    merger.append(file)
merger.write("merged.pdf")
merger.close()

ERROR

Traceback (most recent call last):
  File "C:\Users\Jake\Documents\Work Projects\Python\Contract Merger\Merger .02", line 10, in <module>
    merger.write("merged.pdf")
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_merger.py", line 312, in write
    my_file, ret_fileobj = self.output.write(fileobj)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_writer.py", line 838, in write
    self.write_stream(stream)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_writer.py", line 811, in write_stream
    self._sweep_indirect_references(self._root)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_writer.py", line 960, in _sweep_indirect_references
    data = self._resolve_indirect_object(data)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_writer.py", line 1005, in _resolve_indirect_object
    real_obj = data.pdf.get_object(data)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_reader.py", line 1187, in get_object
    retval = self._encryption.decrypt_object(
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_encryption.py", line 747, in decrypt_object
    return cf.decrypt_object(obj)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_encryption.py", line 185, in decrypt_object
    obj[dictkey] = self.decrypt_object(value)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_encryption.py", line 179, in decrypt_object
    data = self.strCrypt.decrypt(obj.original_bytes)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_encryption.py", line 87, in decrypt
    d = aes.decrypt(data)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\Crypto\Cipher\_mode_cbc.py", line 246, in decrypt
    raise ValueError("Data must be padded to %d byte boundary in CBC mode" % self.block_size)
ValueError: Data must be padded to 16 byte boundary in CBC mode
[Finished in 393ms]

I wrote a basic program from a YouTube video and tried to run it, but I got the error that PyCryptodome was a dependent for PyPDF2. After installing that, I am getting an error about the data length for encryption when writing the pdf. Googling that error lead me to this solution. I am a bit of a novice, and I don't really understand why any kind of encryption is being applied in the first place, other than what I assume is necessary for the pdf reader/writer to operate, so I don't know where I would need to apply that solution in this code.

After writing up this question, I was lead to this solution, which I tried to run the code below, I received the same error.

from PyPDF2 import PdfFileMerger, PdfFileReader
import glob

merger = PdfFileMerger()

files = glob.glob('C:/Users/Jake/Documents/UPLOAD/test_merge/*.pdf')

for filename in files:
    with open(filename, 'rb') as source:
        tmp = PdfFileReader(source)
        merger.append(tmp)

merger.write('Result.pdf')

ERROR

Traceback (most recent call last):
  File "C:\Users\Jake\Documents\Work Projects\Python\Contract Merger\Merger .03.py", line 13, in <module>
    merger.write('Result.pdf')
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_merger.py", line 312, in write
    my_file, ret_fileobj = self.output.write(fileobj)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_writer.py", line 838, in write
    self.write_stream(stream)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_writer.py", line 811, in write_stream
    self._sweep_indirect_references(self._root)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_writer.py", line 960, in _sweep_indirect_references
    data = self._resolve_indirect_object(data)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_writer.py", line 1005, in _resolve_indirect_object
    real_obj = data.pdf.get_object(data)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_reader.py", line 1187, in get_object
    retval = self._encryption.decrypt_object(
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_encryption.py", line 747, in decrypt_object
    return cf.decrypt_object(obj)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_encryption.py", line 185, in decrypt_object
    obj[dictkey] = self.decrypt_object(value)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_encryption.py", line 179, in decrypt_object
    data = self.strCrypt.decrypt(obj.original_bytes)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\PyPDF2\_encryption.py", line 87, in decrypt
    d = aes.decrypt(data)
  File "C:\Users\Jake\Anaconda3\lib\site-packages\Crypto\Cipher\_mode_cbc.py", line 246, in decrypt
    raise ValueError("Data must be padded to %d byte boundary in CBC mode" % self.block_size)
ValueError: Data must be padded to 16 byte boundary in CBC mode
[Finished in 268ms]

My thinking is that something else has gone wrong, but I am at a loss at to what that could be.

What have I done wrong with this build to get this error, and how can I correct it?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958

1 Answers1

0

Turns out this is an issue with PyPDF2. There is a 3-line fix that can be injected to correct the error if you attempt this before it is patched.