2

I tried to print pages of a pdf document:

import PyPDF2
FILE_PATH = 'my.pdf'
with open(FILE_PATH, mode='rb') as f:
    reader = PyPDF2.PdfFileReader(f)
    page = reader.getPage(0) # I tried also other pages e.g 1,2,..
    print(page.extractText())

But I only get a lot of blank space and no error message. Could it be that this pdf version (my.pdf) is not supported by PyPDF2?

This solved it (prints all pages of the document). Thanks

from pdfreader import SimplePDFViewer
fd = open("my.pdf", "rb")
viewer = SimplePDFViewer(fd)
for i in range(1,16): # need range from 1 - max number of pages +1
    viewer.navigate(i)
    viewer.render()
    page_1_content=viewer.canvas.text_content
    page_1_text = "".join(viewer.canvas.strings)
    print (page_1_text)
Maksym Polshcha
  • 18,030
  • 8
  • 52
  • 77
rob
  • 31
  • 6

2 Answers2

0

If it's blank, either the PDF is being read and it's format can't be read by pypdf so it just outputs blank. Maybe put in the absolute filepath instead of relative filepath. If all else fails, try with different PDFs , and if there is a version that does work and yours doesn't, you might need to convert yours to that working type.

Kirby Forsberg
  • 124
  • 1
  • 3
  • Did you find a difference between yours and the working one? Find patterns – Kirby Forsberg Apr 21 '20 at 20:09
  • The only thing I found was that if I open pdf files with version pdf-1.5 they work, while 1.3 and 1.6 do not seem to work. Though that seems strange to me as I assume if 1.5 works either 1.3 or 1.6 has to work too – rob Apr 21 '20 at 20:15
  • Read pypdf docs to see more details. And do more digging into which PDF you are trying to read from. – Kirby Forsberg Apr 21 '20 at 20:17
0

Try pdfreader

from pdfreader import SimplePDFViewer

fd = open("my.pdf", "rb")
viewer = SimplePDFViewer(fd)
viewer.render()

page_0_content=viewer.canvas.text_content
page_0_text = "".join(viewer.canvas.strings)
Maksym Polshcha
  • 18,030
  • 8
  • 52
  • 77