0

I want to make an script that read all the pdf files in directory, copy the second page of each one and write it in one output pdf (with all the seconds pages).
I have already wrote a code but it give me a pdf with blank pages. And that is really strange because I have another code which take the second page of each pdf and make a new pdf for each second page, and that code works. I think my problem may be related with the addPage().
I am using PyPDF2 library to use the pdf files.

import pathlib
from PyPDF2 import PdfFileReader, PdfFileWriter

files_list = [file for file in pathlib.Path(__file__).parent.iterdir() if (file.is_file() and not str(file).endswith(".py"))]
total = len(files_list)    
writer = PdfFileWriter()    
for file in files_list:
    with open(file, 'rb') as infile:
        reader = PdfFileReader(infile)
        reader.decrypt("")
        writer.addPage(reader.getPage(1))            
with open('Output.pdf', 'wb') as outfile:
    writer.write(outfile)    
print('Done.')
Ender Look
  • 2,303
  • 2
  • 17
  • 41
  • why arent you using the code that does what you want then? because it does not combine the pages? – Patrick Artner Dec 31 '17 at 16:03
  • @PatrickArtner, that code doesn't combine, it just make a copy of the old pdf with only the second page, but it doesn't combine then into **one** pdf. – Ender Look Dec 31 '17 at 16:06
  • added an example from other answer here (modified) to my answer. credits to other answer down below. – Patrick Artner Dec 31 '17 at 16:37
  • Not a strict duplicate - but this special case is anwered by a https://stackoverflow.com/questions/22795091/how-to-append-pdf-pages-using-pypdf2 -answer – Patrick Artner Dec 31 '17 at 16:40

2 Answers2

0

Have a look at PdfFileMerger.append - it allows you to merges pages from multiple pdfs into one result file.

append(fileobj, bookmark=None, pages=None, import_bookmarks=True)

Identical to the merge() method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.

Parameters:   
fileobj               A File Object or an object that supports the standard read 
                      and seek methods similar to a File Object. Could also be a 
                      string representing a path to a PDF file.
bookmark (str)        Optionally, you may specify a bookmark to be applied at the 
                      beginning of the included file by supplying the text of 
                      the bookmark.
pages                 can be a Page Range or a (start, stop[, step]) tuple to merge
                      only the specified range of pages from the source document into 
                     the output document.
import_bookmarks (bool)      You may prevent the source document’s bookmarks 
                             from being imported by specifying this as False.

This seems to be better suited to the taks you do then using the PdfFileWriter.

from PyPDF2 import PdfFileMerger, PdfFileReader

# ...

merger = PdfFileMerger()

merger.append(PdfFileReader(file(filename1, 'rb')),None, [2])
merger.append(PdfFileReader(file(filename2, 'rb')),None, [2])

merger.write("document-output.pdf")

Example adapted from answer: https://stackoverflow.com/a/29871560/7505395

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • Your answer seems useful but I am having two problems. 1) The `file()` command doesn't work on my computer, but I fixed that with this 3 lines: `file = PdfFileReader(infile)`, `file.decrypt("")` and `merger.append(file, None, [2])`. 2) If the argument pages is `[2]` I got `TypeError: "pages" must be a tuple of (start, stop[, step])`, if the argument is some tuple range like e.g: `(0,2,1)` I got `PyPDF2.utils.PdfReadError: file has not been decrypted`, **but** if the argument isn't it, "it works" -it append all the pages of the pdf... but at least it doesn't raise error-. – Ender Look Dec 31 '17 at 19:21
0

Have you tried the code at the following: https://www.randomhacks.co.uk/how-to-split-a-pdf-every-2-pages-using-python/

from pyPdf import PdfFileWriter, PdfFileReader
import glob
import sys

pdfs = glob.glob("*.pdf")

for pdf in pdfs:

    inputpdf = PdfFileReader(file(pdf, "rb"))

    for i in range(inputpdf.numPages // 2):

        output = PdfFileWriter()
        output.addPage(inputpdf.getPage(i * 2))

        if i * 2 + 1 <  inputpdf.numPages:
            output.addPage(inputpdf.getPage(i * 2 + 1))

        newname = pdf[:7] + "-" + str(i) + ".pdf"

        outputStream = file(newname, "wb")
        output.write(outputStream)
        outputStream.close()
steve
  • 393
  • 1
  • 4
  • 14
  • I am sorry but your answer doesn't answer my question. You are spliting the pdf every two pages, I ask how to make a new pdf with only the **second** page of each pdf. I'm sorry. – Ender Look Dec 31 '17 at 19:23