Split specific pages of PDF and save it with Python

Question

I am trying to split 20 pages of pdf file (single) , into five respective pdf files , 1st pdf contains 1-3 pages , 2nd pdf file contains only 4th page, 3rd pdf contains 5 to 10 pages, 4th pdf contains 11-17 pages , and 5th pdf contains 18-20 page . I need the working code in python. The below mentioned code splits the entire pdf file into single pages, but I want the grouped pages..

    from PyPDF2 import PdfFileWriter, PdfFileReader
    inputpdf = PdfFileReader(open("input.pdf", "rb"))
    for i in range(inputpdf.numPages):
    j = i+1    
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open("page%s.pdf" % j, "wb") as outputStream:
    output.write(outputStream)

Daweo · Accepted Answer · 2019-04-11T10:54:57.007

3

For me it looks like task for pdfrw using this example from GitHub I written following example code:

from pdfrw import PdfReader, PdfWriter
pages = PdfReader('inputfile.pdf').pages
parts = [(3,6),(7,10)]
for part in parts:
    outdata = PdfWriter(f'pages_{part[0]}_{part[1]}.pdf')
    for pagenum in range(*part):
        outdata.addpage(pages[pagenum-1])
    outdata.write()

This one create two files: pages_3_6.pdf and pages_7_10.pdf each with 3 pages i.e. 3,4,5 and 7,8,9. Note pagenum-1 in code, that -1 is used due to fact that pdf pages numeration starts at 1 rather than 0. I also used so-called f-strings to get names of output files. In my opinion it is slick method but it is not available in Python2 and I am not sure if it is available in all Python3 versions (I tested my code in 3.6.7), so you might use old formatting method instead if you wish. Remember to alter filenames and ranges accordingly to your needs.

edited Apr 11 '19 at 10:54

answered Apr 10 '19 at 12:07

Daweo

31,313
3
12
25

parts = [(1,3),(4),(5,10),(11,17),(18,20)] for part in parts: outdata = PdfWriter(f'pages_{part[0]}_{part[1]}.pdf') for pagenum in range(*part): outdata.addpage(pages[pagenum-1]) outdata.write() the split code is not working for the above case kindly help. – Sutirtha Thakur Apr 11 '19 at 06:01
@SutirthaThakur: `parts` have to be `list` of 2-`tuple`s so `(4)` is not legal. You should use `(4,5)` instead. Also keep in mind `(1,3)` means pages 1,2 and `(4,5)` means page 4. – Daweo Apr 11 '19 at 07:06
parts = [(1,4),(4,5),(5,10),(10,20)] when I am entering this I am getting IndexError: list index out of range – Sutirtha Thakur Apr 11 '19 at 09:20
@SutirthaThakur: please check if your .pdf file actually have so many pages, I do not see any other possible reason for `IndexError`. – Daweo Apr 11 '19 at 10:39
It contains 20 pages only – Sutirtha Thakur Apr 11 '19 at 10:41
Please add line `print(len(pages))` below `pages = PdfReader...` this will show how many pages were actually readed. – Daweo Apr 11 '19 at 11:05
its saying 12 but the it should be 20 , ideally speaking – Sutirtha Thakur Apr 11 '19 at 12:02
Then this mean `PdfReader` for some reason did not load whole .pdf, it is beyond my capability to solve this issue – Daweo Apr 11 '19 at 12:30
input_file = PyPDF2.PdfFileReader('input.pdf') this works fine – Sutirtha Thakur Apr 12 '19 at 08:58

score -1 · Answer 2 · answered Apr 10 '19 at 11:20

-1

if you have python 3, you can use tika according to the following answer here:

How to extract text from a PDF file?

answered Apr 10 '19 at 11:20

Sa'ad

1
3

I want to split pages , then I want the extraction – Sutirtha Thakur Apr 10 '19 at 11:22
@SutirthaThakur [Here](https://stackoverflow.com/questions/490195/split-a-multi-page-pdf-file-into-multiple-pdf-files-with-python) is what you are looking for. – FrainBr33z3 Apr 10 '19 at 12:06
I want to split it page wise , only selected pages are required , if possible share the code. – Sutirtha Thakur Apr 11 '19 at 06:04

thrinadhn · Answer 3 · 2022-10-27T16:57:23.680

How to extract specific pages (or split specific pages) from a PDF file and save those pages as a separate PDF using Python.

pip install PyPDF2 # to install module/package

from PyPDF2 import PdfFileReader, PdfFileWriter

pdf_file_path = 'Unknown.pdf'
file_base_name = pdf_file_path.replace('.pdf', '')

pdf = PdfFileReader(pdf_file_path)

pages = [0, 2, 4] # page 1, 3, 5
pdfWriter = PdfFileWriter()

for page_num in pages:
    pdfWriter.addPage(pdf.getPage(page_num))

with open('{0}_subset.pdf'.format(file_base_name), 'wb') as f:
    pdfWriter.write(f)
    f.close()

CREDIT : How to extract PDF pages and save as a separate PDF file using Python

Split specific pages of PDF and save it with Python

3 Answers3