0

I have two folders with a different set of pdfs. I know that the PDF with a specific name from the first folder needs to be combined with a PDF with a specific name from the second folder. For example, "PID-01.pdf" from the first folder needs to be combined with "FNN-PID-01.pdf" from the second folder, "PID-02.pdf" from the first folder needs to be combined with "FNN-PID-02.pdf" from the second folder, I have two folders with so on and so forth. I am using a python module PyPDF2. Could anyone give an example using PyPDF2

exd-as
  • 23
  • 1
  • 7

2 Answers2

0

Did you mean "merged" as by saying "combined"?

if so,

lets say folder1 contains "PID-01.pdf" and folder2 contains "FNN-PID-01.pdf".

import os
from PyPDF2 import PdfFileMerger, PdfFileReader
folder1 = "/your/path/to/folder1/"
folder2 = "/your/path/to/folder2/"
merged_folder = "/your/path/to/merged/folder/"

f1_files = os.listdir(folder1) # ['PID-01.pdf','PID-02.pdf'...etc]
f2_files = os.listdir(folder2) # ['FNN-PID-01.pdf','FNN-PID-02.pdf'...etc]

def pdf_merger(f1,f2):
    merger = PdfFileMerger()
    f1_content = PdfFileReader(file(os.path.join(folder1,f1), 'rb'))
    f2_content = PdfFileReader(file(os.path.join(folder2,f2), 'rb'))
    merger.append(f1_content)
    merger.append(f2_content)
    out = os.path.join(merged_folder,f"merged-{f1}")
    merger.write(out)

#below code will iterate each file in folder1 and checks if those               
#folder2 filename string "FNN-PID-01.pdf" contains substring "PID-01.pdf"
#if matchs, the 2 matching files are merged and saved to merged_folder

for file1 in f1_files : 
    for file2 in f2_files: 
        if file1 in file2: 
            pdf_merger(file1,file2)

You can just iterate files and write your own matching pattern using regex for advanced usage.

RG_RG
  • 349
  • 5
  • 8
  • Thank you so much for helping me out. I made some changes in you code as follows: f1_content = PdfFileReader(open(os.path.join(folder1,f1),'rb')) f2_content = PdfFileReader(open(os.path.join(folder2,f2),'rb')) for file1 in f1_files: for file2 in f2_files: if str(f"FNN-{file1}") == file2: pdf_merger(file1,file2) – exd-as Aug 04 '21 at 17:28
  • you don't have to declare str(f"xyz") because f" itself is a string. – RG_RG Aug 06 '21 at 08:27
0

Here a pedagogical example:

from PyPDF4 import PdfFileReader, PdfFileWriter
#from PyPDF2 import PdfFileReader, PdfFileWriter


def concatenate(pdf_out, *pdfs):
    # initialize a write instance
    pdf_w = PdfFileWriter()

    for pdf in pdfs:
        pdf_r = PdfFileReader(open(pdf, 'rb')) # pass a binary descriptor to the pdf reader
        pdf_w.appendPagesFromReader(pdf_r)

    with open(pdf_out, 'w') as fd:
        pdf_w.write(fd)     # write binary stream of data to destination


pdf1 = 'dir1/PID-01.pdf'
pdf2 = 'dir2/PID-01.pdf'
pdf_out = '?/?.pdf' # choose where to save the merged file

concatenate(pdf_out, pdf1, pdf2)
import os
# under the assumption the both folder have the same amount files
dir_1 = #
dir_2 = #
dir_target = #
counter = 1
for pdf1, pdf2 in zip(os.listdir(dir1), os.listdir(dir2)):
    pdf_new_path = os.path.join(dir_target, 'PID-PNN-{}.pdf'.format(counter)) # or choose another filename pattern

    concatenate(pdf_new_path, pdf1, pdf2)
    counter += 1

Remark

PyPDF2 and PyPDF4 are almost(?) back compatible so just change the import

the function is order-sensitive! pdf1 come 1st then pdf2 in the final document

cards
  • 3,936
  • 1
  • 7
  • 25