0

This code basically already create the PDFs. After it created the PDF it is copied in its own folder. What I am trying to do is merge what is in the folder. Then it would go to the next folder and do the merge. Then on to the next folder and do the merge. And such. But when I do it, it's just merging the last PDF and not all the PDFs.

import os
import shutil
import time
from PyPDF2 import PdfFileMerger
from reportlab.pdfgen.canvas import Canvas

path = input("Paste the folder path of which all the PDFs are located to begin the automation.\n")
# Only allowed to use your C or H drive.
while True:
    if "C" in path[0]:
        break
    elif "H" in path[0]:
        break
    else:
        print("Sorry you can only use your C drive or H drive\n")
    path = input("Paste the folder path of which all the PDFs are located to begin the automation.\n")

moving_path = path + "\\Script"
new_path = moving_path + "\\1"
folder_name = {}

# List all directories or files in the specific path
list_dir = ["040844_135208_3_192580_Sample_010.pdf", "040844_135208_3_192580_Sample_020.pdf",
            "040844_135208_3_192580_Sample_050.pdf", "058900_84972_3_192163_Sample_010.pdf",
            "058900_84972_3_192163_Sample_020.pdf", "058900_84972_3_192163_Sample_030.pdf"]


# Pauses the program
def wait(num):
    time.sleep(num)


# Change and make directory
def directory():
    os.chdir(path)

    for i in list_dir:
        canvas = Canvas(i)
        canvas.drawString(72, 72, "Hello, World")
        canvas.save()
    os.makedirs("Script")
    os.chdir(path + "\\Script")
    os.makedirs("1")
    os.makedirs("Merge")
    os.chdir(new_path)


def main():
    match = []
    for i in list_dir:
        search_zero = i.split("_")[2]
        if search_zero != "0":
            match.append((i.split("_", 3)[-1][:6]))

        else:
            match.append((i.split("_", 0)[-1][:6]))

    new_match = []
    for i, x in enumerate(match):
        if "_" in match[i]:
            new_match.append(x[:-1])
        else:
            new_match.append(x)

    for i in list_dir:
        key = i.split("_", 3)[-1][:6]

        if key in folder_name:
            folder_name[key].append(i)
        else:
            folder_name[key] = [i]

    for i, x in enumerate(list_dir):
        # Skips over the error that states that you can't make duplicate folder name
        try:
            os.makedirs((new_match[i]))
        except FileExistsError:
            pass

        # Moves the file that doesn't contain "PDFs" into the "1" folder and the one that does in the "Merge" folder
        if "PDFs" not in list_dir[i]:
            shutil.copy(f"{path}\\{list_dir[i]}", f"{new_path}\\{new_match[i]}")
            os.chdir(f"{new_path}\\{new_match[i]}")
            merger = PdfFileMerger(x)
            merger.append(x)
            merger.write(f"{new_match[i]}.pdf")
            merger.close()
            os.chdir(new_path)

        else:
            shutil.copy(f"{path}\\{list_dir[i]}", f"{moving_path}\\Merge\\{x}")


directory()
wait(0.7)
main()
print("Done!")
wait(2)
  • Does this answer your question? [How to append PDF pages using PyPDF2](https://stackoverflow.com/questions/22795091/how-to-append-pdf-pages-using-pypdf2) You need to declare your merger object _outside_ the loop and write from it _outside_ the loop; as is, you redeclare merger every time through the loop and write to the same file (essentially overwriting), and so only the last merger (of the last PDF) is kept. At least, I’m pretty sure that’s what’s happening. – Zach Young Jun 02 '22 at 14:29
  • When I do that, I get "RuntimeError: close() was called and thus the writer cannot be used anymore". Placing the Merge.close() outside should fix the issue, but now it doesn't go into each folder and merge the files for just that folder. It takes a combination of both folders pdfs and merge them. – Mike Liar Jun 02 '22 at 19:44
  • If you want to merge PDFs, you need to create some kind of _merge_ object, call `append(pdf)` for all PDFs you want merged (probably a loop), write the final merged PDF, then close the merge object. If you need to do that in another loop for some other logic, then that's for you to work out. Have you tried just statically listing a few PDFs to merge, and seen that work? Then, build out from there. – Zach Young Jun 02 '22 at 19:48
  • Yes, I did just that. The issue is that, it's not treating each folder separately. It simply does a combination of those folders. And Yes I statically list a few pdfs to merge and it worked. But what I want to do is make it merge for each folder separately. A for loop should do just that, but it's like you cant use close() in the for loop. – Mike Liar Jun 02 '22 at 20:12

2 Answers2

0

I have these 4 PDFs:

pg1.pdf pg2.pdf pg3.pdf pg4.pdf
Pg 1 Pg 2 Pg 3 Pg 4

Here's a starter-script to merge Pg1 and Pg2 into one PDF, and Pg3 and Pg4 into another:

from PyPDF2 import PdfMerger

# Create merger object
merger = PdfMerger()

for pdf in ["pg1.pdf", "pg2.pdf"]:
    merger.append(pdf)

merger.write("merged_1-2.pdf")
merger.close()

# Re-create merger object
merger = PdfMerger()

for pdf in ["pg3.pdf", "pg4.pdf"]:
    merger.append(pdf)

merger.write("merged_3-4.pdf")
merger.close()

Now we extend that idea and wrap up the data so it will drive a loop that does the same thing:

page_sets = [
    # Individaul PDFs      , final merged PDF
    [["pg1.pdf", "pg2.pdf"], "merged_1-2.pdf"],
    [["pg3.pdf", "pg4.pdf"], "merged_3-4.pdf"],
]

for pdfs, final_pdf in page_sets:
    merger = PdfMerger()

    for pdf in pdfs:
        merger.append(pdf)

    merger.write(final_pdf)
    merger.close()

I get the following for either the straight-down script, or the loop-y script:

merged_1-2.pdf merged_3-4.pdf
merged_1-2.pdf merged_3-4.pdf

As best I understand your larger intent, that loop represents you writing groups of PDFs into a merged PDF (in separate directories?), and the structure of:

  1. create merger object
  2. append to merger object
  3. write merger object
  4. closer merger object
  5. Back to Step 1

works, and as far as I can tell is the way to approach this problem.

As an aside from the issue of getting the merging of the PDFs working... try creating the on-disk folder structure first, then create a data structure like page_sets that represents that on-disk structure, then (finally) pass off the data to the loop to merge. That should also make debugging easier:

  1. "Do I have the on-disk folders correct?", "Yes", then move on to
  2. "Do I have page_sets correct?", "Yes", then move on to
  3. the actual appending/writing

And if the answer to 1 or 2 is "No", you can inspect your file system or just look at a print-out of page_sets and spot any disconnects. From there, merging the PDFs should be really trivial.

Once that's working correctly, if you want to go back and refactor to try and get folders/data/merge in one pass for each set of PDFs, then go for it, but at least you have a working example to fall back on and start to ask where you missed something if you run into problems.

Zach Young
  • 10,137
  • 4
  • 32
  • 53
0

Whenever you end up with something that only contains the last value from a loop, check your loop logic. In this case, your merger loop looks like this:

for i, x in enumerate(list_dir):
    ...

    if "PDFs" not in list_dir[i]:
        ...
        merger = PdfFileMerger(x)
        merger.append(x)
        merger.write(f"{new_match[i]}.pdf")
        merger.close()

So for each file in list_dir you create a new merger, add the file, and write out the PDF. Unsurprisingly, each PDF file you write contains exactly one input pdf.

Move the merger creation and merger.write out of the innermost loop, so that all of the files to be merged are appended together and written out as a single PDF. Your naming logic is a bit convoluted, but it seems that you want to be looping over the variable folder_name, and merging the corresponding files. So, maybe like this:

for key in folder_name:
    merger = PdfFileMerger()
    for x in folder_name[key]:
        merger.append(x)
    merger.write(key+".pdf") 

You'll need to add your own path and naming logic; I won't try to guess what you intended.

alexis
  • 48,685
  • 16
  • 101
  • 161