PyPDF2 PdfFileWriter has no attribute stream

Question

I am trying to split a pdf into its pages and save each page as a new pdf. I have tried this method from a previous question with no success and the pypdf2 split example from here with no success. EDIT: I can see in my files that it does successfully write the first page, the second page pdf is then created but is empty.

Here is the code I am trying to run:

from PyPDF2 import PdfFileWriter, PdfFileReader

inputpdf = PdfFileReader(open("my_pdf.pdf", "rb"))

for i in range(inputpdf.numPages):
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open("document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)

Here is the full error message:

Traceback (most recent call last):
  File "pdf_functions.py", line 9, in <module>
    output.write(outputStream)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 557, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 575, in _sweepIndirectReferences
    if data.pdf.stream.closed:
AttributeError: 'PdfFileWriter' object has no attribute 'stream'

I also tried this and confirmed that I can indeed extract a single page.

from PyPDF2 import PdfFileWriter, PdfFileReader
inputpdf = PdfFileReader(open("/home/ubuntu/inputs/cityshape/form5.pdf", "rb"))

#for i in range(inputpdf.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(2))
with open("document-page2.pdf", "wb") as outputStream:
    output.write(outputStream)

working fine on ubuntu too [img](https://postimg.org/image/rujz7kqyx/) — Hisham Karam, Oct 21 '16 at 04:47
Did a clean install and works fine now, weird that it all worked fine outside of the for loop. Thanks for your help @Hisham. — pope, Oct 21 '16 at 05:02

score 12 · Answer 1 · edited Oct 09 '18 at 10:27

The same thing happened to me.

I was able to solve it by moving the following line inside the loop:

inputpdf = PdfFileReader(open("/home/ubuntu/inputs/cityshape/form5.pdf", "rb"))

I believe that some versions of PyPDF2 have some sort of bug, that when you invoke thePdfFileWriter.write method, it messes with the PdfFileReader instance. By recreating the PdfFileReader instance after each write, it bypasses this bug.

The following code should work (untested):

from PyPDF2 import PdfFileWriter, PdfFileReader

pdf_in_file = open("my_pdf.pdf",'rb')

inputpdf = PdfFileReader(pdf_in_file)
pages_no = inputpdf.numPages

for i in range(pages_no):
    inputpdf = PdfFileReader(pdf_in_file)
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open("document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)

pdf_in_file.close()

Works for me! Thanks. Maybe the `PdfFileWriter.write` method consumes the stream? Not sure. — Wesley Cheek, Apr 08 '22 at 06:00

score 1 · Answer 2 · answered Apr 14 '22 at 17:33

I solved the error "AttributeError: 'PdfFileWriter' object has no attribute 'stream'" by repeating opening the PDF.

My old code:

pdf = PdfFileReader('arq.pfd')
pagi = 14
pagf = 20
dic = PdfFileMerger()

for i in range(pagi -1, pagf):

  pag = PdfFileWriter()
  pag.addPage(pdf.getPage(i))

  with open('pag.pdf', 'wb') as split:

    pag.write(split)

  pag = PdfFileReader('pag.pdf')
  dic.append(pag)

with open(f'PDF ({pagi} - {pagf}).pdf', 'wb') as split:

  dic.write(split)

!rm pag.pdf

My new code:

pdf = PdfFileReader('arq.pdf')
pagi = 14
pagf = 20
dic = PdfFileMerger()

for i in range(pagi - 1, pagf):

  pag = PdfFileWriter()
  pag.addPage(pdf.getPage(i))

  with open('pag.pdf', 'wb') as split:

    pdf = PdfFileReader('arq.pdf') # Adding pdf again
    pag.write(split)

  pag = PdfFileReader('pag.pdf')
  dic.append(pag)

with open(f'PDF ({pagi} - {pagf}).pdf', 'wb') as split:

  dic.write(split)

!rm pag.pdf

Hugs!

score 0 · Answer 3 · answered Apr 11 '22 at 08:33

0

I have this problem today. But I found so many code just like me without errors, so I think maybe just version error. I have used pypdf2 version==1.27.3, just change it version to 1.25.0, this error will fix.

pip install pypdf2==1.25.0

answered Apr 11 '22 at 08:33

s gong

79
5

score 0 · Answer 4 · answered Oct 25 '22 at 10:11

0

This bug was fixed in version 1.27.8

answered Oct 25 '22 at 10:11

Дмитрий Мельников

33
2
8

PyPDF2 PdfFileWriter has no attribute stream

4 Answers4