0

I want to create a script (preferably in python) that gets a number of pdf documents from a folder in order and puts a big number in the first page of each document (the first document of the folder should be numbered 1, the next one 2 and so on).

Any ideas of libraries that could help me implement this?

I've seen some python libraries that allow you to number the pages of a document, but not to put numbers to different documents (kind of as a watermark).

amayito37
  • 13
  • 2
  • Does this answer your qustion? https://stackoverflow.com/questions/65994088/adding-text-to-a-pdf-via-python – Nick ODell Apr 03 '23 at 16:36

2 Answers2

0

you can use the Python library PyPDF2, which is a pure-python PDF toolkit that allows you to manipulate and modify PDF documents. Here is an example of how you can use PyPDF2 to add a watermark with a page number to each page of a PDF document:

import os
import PyPDF2

# Get the path to the folder containing the PDF documents
pdf_folder_path = '/path/to/pdf/folder'

# Create a list of PDF files in the folder, sorted by filename
pdf_files = sorted([f for f in os.listdir(pdf_folder_path) if f.endswith('.pdf')])

# Loop through each PDF file in the list
for i, pdf_file in enumerate(pdf_files):
    # Open the PDF file for reading and writing
    with open(os.path.join(pdf_folder_path, pdf_file), 'rb+') as pdf:
        # Create a PdfFileReader object for the PDF file
        pdf_reader = PyPDF2.PdfFileReader(pdf)
        # Create a PdfFileWriter object for the output PDF file
        pdf_writer = PyPDF2.PdfFileWriter()
        # Loop through each page in the PDF file
        for page_num in range(pdf_reader.getNumPages()):
            # Get the current page from the PDF file
            page = pdf_reader.getPage(page_num)
            # Create a watermark object with the page number and position it at the bottom-right corner of the page
            watermark = PyPDF2.pdf.PageObject.createBlankPage(None, page.mediaBox.getWidth(), page.mediaBox.getHeight())
            watermark.mergeScaledTranslatedPage(page, 1, 0, 0, 1, 0, 0)
            watermark.mergeScaledTranslatedPage(PyPDF2.pdf.PageObject.createTextObject(PyPDF2.pdf.PdfContentByte(None), str(i+1)), 1, 0, 0, 1, page.mediaBox.getWidth()-100, 20)
            # Merge the watermark with the current page and add it to the output PDF file
            page.mergePage(watermark)
            pdf_writer.addPage(page)
        # Save the output PDF file with the watermark
        pdf_writer.write(pdf)
0

And here is a solution using PyMuPDF.

import os
import fitz  # import PyMuPDF

# Get the path to the folder containing the PDF documents
pdf_folder_path = '/path/to/pdf/folder'

# Create a list of PDF files in the folder, sorted by filename
pdf_files = sorted([f for f in os.listdir(pdf_folder_path) if f.endswith('.pdf')])

for i, f in enumerate(pdf_files, start=1):
    filename = os.path.join(pdf_folder_path, f)
    doc = fitz.open(filename)
    page = doc[0]  # load page 1
    page.wrap_contents()  # guard against coordinate sloppiness
    prect = page.rect  # page rectangle
    # text insertion point
    point = fitz.Point(prect.width - 50, prect.height - 36)
    # insert text, fontsize=25
    page.insert_text(point, str(i), opacity=0.3, fontsize=25)
    doc.saveIncr()  # incremental save
Jorj McKie
  • 2,062
  • 1
  • 13
  • 17