2

I am using qpdf to merge all pdf files in a directory and I would like to merge only the first page of multiple inputfiles. According to the qpdf documentation on page selection this should be possible. I have tried couple variants without luck:

qpdf --empty --pages *.pdf 1-1 -- "output.pdf"
qpdf --empty --pages *.pdf 1 -- "output.pdf"

What can I do?

Human
  • 726
  • 8
  • 27

2 Answers2

4

As explained in this qpdf issue, the shell expands *.pdf in the command qpdf --empty --pages *.pdf 1 -- "output.pdf", that means it replaces *.pdf with a list of pdf files in the current directory. Assuming you have the following pdf files in the current directory:

  • file1.pdf
  • file2.pdf
  • file3.pdf

the command becomes:

qpdf --empty --pages file1.pdf file2.pdf file3.pdf 1 -- "output.pdf"

so the page selector is only applied to the last pdf. On a Mac or Linux you can script the command to add a 1 after each pdf-filename, to take the first page of each pdf file and put it all together like so:

qpdf --empty --pages $(for i in *.pdf; do echo $i 1; done) -- output.pdf
Human
  • 726
  • 8
  • 27
-1

The following piece of code worked for me very well.

import os
from PyPDF2 import PdfWriter, PdfReader

pdf_files = []
# Get all PDF documents in current directory
for filename in os.listdir("."):
    if filename.endswith(".pdf"):
        pdf_files.append(filename)
pdf_files.sort(key=str.lower)

# Take first page from each PDF    

pdf_writer = PdfWriter()

for filename in pdf_files:
    reader = PdfReader(filename)
    page = reader.pages[0]
    pdf_writer.add_page(page)


with open("CombinedFirstPages.pdf", "wb") as fp:
    pdf_writer.write(fp)
Jordan
  • 1
  • 2
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 30 '23 at 05:10