15

Is anybody has experience merging two page of PDF file into one using python lib PyPDF2. When I try page1.mergePage(page2) it results with page2 overlayed page1. How to make it to add page2 to the bottom of the page1?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Valentin Melnikov
  • 304
  • 1
  • 2
  • 9
  • The author is probably looking for something like this: http://www.pdfdu.com/pdf-pages-merge.aspx – S.A. Jun 22 '18 at 12:18

5 Answers5

31

As I'm searching the web for python pdf merging solution, I noticed that there's a general misconception with merging versus appending.

Most people call the appending action a merge but it's not. What you're describing in your question is really the intended use of mergePage which should be called applyPageOnTopOfAnother but that's a little long. What you are (were) looking for is really appending two files/pages into a new file.

Appending PDF files

Using the PdfFileMerger class and its append method.

Identical to the merge() method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.

Here's one way to do it taken from pypdf Merging multiple pdf files into one pdf:

from PyPDF2 import PdfFileMerger, PdfFileReader

# ...

merger = PdfFileMerger()

merger.append(PdfFileReader(file(filename1, 'rb')))
merger.append(PdfFileReader(file(filename2, 'rb')))

merger.write("document-output.pdf")

Appending specific PDF pages

And to append specific pages of different PDF files, use the PdfFileWriter class with the addPage method.

Adds a page to this PDF file. The page is usually acquired from a PdfFileReader instance.

file1 = PdfFileReader(file(filename1, "rb"))
file2 = PdfFileReader(file(filename2, "rb"))

output = PdfFileWriter()

output.addPage(file1.getPage(specificPageIndex))
output.addPage(file2.getPage(specificPageIndex))

outputStream = file("document-output.pdf", "wb")
output.write(outputStream)
outputStream.close()

Merging two pages into one page

Using mergePage

Merges the content streams of two pages into one. Resource references (i.e. fonts) are maintained from both pages. The mediabox/cropbox/etc of this page are not altered. The parameter page’s content stream will be added to the end of this page’s content stream, meaning that it will be drawn after, or “on top” of this page.

file1 = PdfFileReader(file(filename1, "rb"))
file2 = PdfFileReader(file(filename2, "rb"))

output = PdfFileWriter()

page = file1.getPage(specificPageIndex)
page.mergePage(file2.getPage(specificPageIndex))

output.addPage(page)

outputStream = file("document-output.pdf", "wb")
output.write(outputStream)
outputStream.close()
MasterOdin
  • 7,117
  • 1
  • 20
  • 35
Emile Bergeron
  • 17,074
  • 5
  • 83
  • 129
  • 2
    No, you misunderstood me. What did I really need is to merge 2 pages into one, one beneath another. – Valentin Melnikov Apr 27 '15 at 10:51
  • You should really clarify that in your question. You mean like, page 1 on the top half and page 2 on the bottom half? – Emile Bergeron Apr 27 '15 at 11:10
  • @ValentinMelnikov Still, it's not merging, it's appending the content of both pages onto a new page. – Emile Bergeron May 22 '15 at 03:15
  • 2
    But, this answer is very useful for me :) (my task is to combine foreground + background and get the output) – xwild Jan 26 '16 at 10:28
  • 1
    "Append" has special implications when talking about PDF. Given that PDF allows to change the content and presentation of a document just by appending bytes by mean of writing a new tree for the new version at the end of the document. That is specially enforced when trying to keep the past versions of the document digitally signed by mean of incremental updates. Check this document "Digital Signatures in a PDF - Adobe" (https://www.adobe.com/devnet-docs/acrobatetk/tools/DigSig/Acrobat_DigitalSignatures_in_PDF.pdf), in the Figure #5. – yucer Sep 20 '16 at 14:57
  • Can you change the name of the question to "How to append the content of one PDF page to another using PyPDF2" ? I need to use the name of the question to ask a way to "append PDF pages using PyPDF2" (the real append) – yucer Sep 20 '16 at 14:59
  • @yucer you're commenting on an answer, I have no way to change the question other than editing it, like you. If you want to ask a question, go ahead, this is what SO is for. – Emile Bergeron Sep 20 '16 at 15:01
  • You are right, maybe is better to make another question. I have made that here: http://stackoverflow.com/questions/39597772/how-to-append-content-to-a-pdf-using-pypdf2-and-preserve-the-past-digital-signat – yucer Sep 20 '16 at 15:14
4

If the 2 PDFs do not exist on your local machine, and instead are normally accessed/download via a URL (i.e. http://foo/bar.pdf & http://bar/foo.pdf), we can fetch both PDFs from remote locations and merge them together in memory in one-fell-swoop.

This eliminates the assumed step of downloading the PDF to begin with, and allows us to generalize beyond the simple case of both PDFs existing on disk. Specifically, it generalizes the solution to any HTTP-accessible PDF.

The example:

    from PyPDF2 import PdfFileMerger, PdfFileReader

    pdf_content_1 = requests.get('http://foo/bar.pdf').content
    pdf_content_2 = requests.get('http://bar/foo.pdf').content

    # Write to in-memory file-like buffers        
    pdf_buffer_1 = StringIO.StringIO().write(pdf_content_1)
    pdf_buffer_2 = StringIO.StringIO().write(pdf_content_2)
    pdf_merged_buffer = StringIO.StringIO()

    merger = PdfFileMerger()
    merger.append(PdfFileReader(pdf_buffer_1))
    merger.append(PdfFileReader(pdf_buffer_2))
    merger.write(pdf_merged_buffer)

    # Option 1:
    # Return the content of the buffer in an HTTP response (Flask example below)
    response = make_response(pdf_merged_buffer.getvalue())
    # Set headers so web-browser knows to render results as PDF
    response.headers['Content-Type'] = 'application/pdf'
    response.headers['Content-Disposition'] = \ 
        'attachment; filename=%s.pdf' % 'Merged PDF'
    return response 
    # Option 2: Write to disk
    with open("merged_pdf.pdf", "w") as fp:
        fp.write(pdf_merged_buffer.getvalue())
The Aelfinn
  • 13,649
  • 2
  • 54
  • 45
  • This brings nothing new in regard to the question at hand and the rest is out of scope here. – Emile Bergeron Jul 14 '17 at 21:27
  • Yes, the answer here is a specific way to solve a more specific problem than the OP. It is still relevant as an alternative approach to the OP. If you believe this answer does not add value, please downvote or flag rather than leaving opinionated comments. – The Aelfinn Jul 15 '17 at 02:10
  • This question is about how to merge/append PDF files with python. Downloading files from the internet is irrelevant. It isn't an alternative approach since you're using the same `PdfFileMerger` technique. – Emile Bergeron Jul 17 '17 at 14:34
  • This uses the same approach (PdfFileMerger), but does not assume both PDFs to be available on your local disk, and instead generalizes to using remote PDFs. Again, please downvote or flag rather than starting an opinionated comment war on StackOverflow. – The Aelfinn Jul 17 '17 at 14:41
  • @TheAelfinn When I try your approach I always get the `AttributeError: 'int' object has no attribute 'seek'` with `merger.append(PdfFileReader(pdf_buffer_1))`. – Mazze May 03 '22 at 08:22
3

Did it this way:

reader = PyPDF2.PdfFileReader(open("input.pdf",'rb'))

NUM_OF_PAGES = reader.getNumPages()

page0 = reader.getPage(0)
h = page0.mediaBox.getHeight()
w = page0.mediaBox.getWidth()

newpdf_page = PyPDF2.pdf.PageObject.createBlankPage(None, w, h*NUM_OF_PAGES)
for i in range(NUM_OF_PAGES):
    next_page = reader.getPage(i)
    newpdf_page.mergeScaledTranslatedPage(next_page, 1, 0, h*(NUM_OF_PAGES-i-1))

writer = PdfFileWriter()
writer.addPage(newpdf_page)

with open('output.pdf', 'wb') as f:
    writer.write(f)

It works when every page has the same height and width. Otherwise, it needs some modifications.

Maybe Emile Bergeron solution is better. Didn't try it.

adsurbum
  • 3,107
  • 3
  • 22
  • 27
2

The pdfrw library can do this. There is a 4up example in the examples directory that places 4 input pages on every output page, and a booklet example that takes 8.5x11 input and creates 11x17 output. Disclaimer -- I am the pdfrw author.

Patrick Maupin
  • 8,024
  • 2
  • 23
  • 42
-2

The code posted in this following link accomplished your objective.

Using PyPDF2 to merge files into multiple output files

I believe the trick is:

merger.append(input)

Community
  • 1
  • 1
user3482598
  • 1
  • 1
  • 3