2

I'm trying to read this pdf file (https://www.accessdata.fda.gov/cdrh_docs/pdf14/K141693.pdf) and am following these suggestions from SO

Opening pdf urls with pyPdf

I have actually downloaded the file locally and am running the following code

import PyPDF2
pdf_file = open("K141693.pdf")
pdf_read = PyPDF2.PdfFileReader(pdf_file)

but my code hangs indefinitely. I'm running Python 2.7 and here is the stacktrace.

Traceback (most recent call last):

File "", line 1, in runfile('C:/PoC/pdf_reader.py', wdir='C:/PoC')

File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile execfile(filename, namespace)

File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile exec(compile(scripttext, filename, 'exec'), glob, loc)

File "C:/PoC/pdf_reader.py", line 13, in pdf_read = PyPDF2.PdfFileReader(pdf_file)

File "C:\ProgramData\Anaconda2\lib\site-packages\PyPDF2\pdf.py", line 1084, in init self.read(stream)

File "C:\ProgramData\Anaconda2\lib\site-packages\PyPDF2\pdf.py", line 1697, in read line = self.readNextEndLine(stream)

File "C:\ProgramData\Anaconda2\lib\site-packages\PyPDF2\pdf.py", line 1938, in readNextEndLine x = stream.read(1)

KeyboardInterrupt

I came across another post here PyPDF2 hangs on processing but that too doesn't have a response.

Krishna
  • 415
  • 1
  • 4
  • 11

1 Answers1

0

You need to parse the file in binary ('rb') mode. (This works in Python 3:)

import PyPDF2
pdf_file = open("K141693.pdf", "rb")
read_pdf = PyPDF2.PdfFileReader(pdf_file)
PythonSherpa
  • 2,560
  • 3
  • 19
  • 40