0

There are 5 pdf files in my cache folder. I want to read these in and merge them together so that I end up with one pdf file that consists as the 5 files located in the folder.

My problem is that both the PyPDF2 PdfFileMerger, and the PyPDF2 PdfWriter (have tried both variations), merge me 5 times the first file.

When I save the read PDF files individually back to my hard drive, they are neatly stored without any problems. So I don't seem to have any problems reading in the files. Is my loop wrong when merging? Do I not understand the PyPDF2 documentation correctly?

Unfortunately I can't find my error and hope for your support.

Best thanks

import os

from PyPDF2 import PdfReader
from PyPDF2 import PdfFileMerger 

pdf_files = [f for f in os.listdir(CACHE_FOLDER_PATH) if f.endswith('.pdf')]
pdf_files.sort()

merger = PdfFileMerger()

for element in pdf_files:
    with open(os.path.join(CACHE_FOLDER_PATH, element), 'rb') as f:
        merger.append(PdfReader(f))

with open(os.path.join(CACHE_FOLDER_PATH, 'output.pdf'), 'wb') as f:
    merger.write(f)
Paul
  • 23
  • 5
  • For your Information: The solution mentioned here [https://stackoverflow.com/questions/70861245/pypdf2-pdffilemerger-only-writes-first-pdf](PyPDF2.PdfFileMerger only writes first pdf) has the same bad effect. Only the first pdf is written as much as I have files inside my folder. – Paul Sep 05 '22 at 11:19

1 Answers1

0

Solved it.

I just did an update from version 2.4.1 to the latest version 2.10.5. Apparently it was actually a bug in the version of PyPDF2 I was using.

Thank you guys anyway.

Paul
  • 23
  • 5