2

I'm trying to parse text from pdf file. while I was doing tutorial of how to PyPDF2 I got the following error. I did the search for an answer but ended up finding none. Any Help will be greatly appreciated.

Traceback (most recent call last):
  File "D:/text_recognizer/main.py", line 4, in <module>
    inputStream = PyPDF2.PdfFileReader(input)
  File "D:\KimKanna's Class\python27\lib\site-packages\PyPDF2\pdf.py", line 1084, in __init__
    self.read(stream)
  File "D:\KimKanna's Class\python27\lib\site-packages\PyPDF2\pdf.py", line 1689, in read
    stream.seek(-1, 2)
IOError: [Errno 22] Invalid argument

here is fullcode

import PyPDF2

with open(".\\pdf\\test_sample.pdf","rb") as input:
    inputStream = PyPDF2.PdfFileReader(input)
Ganesh Tata
  • 1,118
  • 8
  • 26
Kanna Kim
  • 383
  • 1
  • 3
  • 15
  • I kind of found why it didn't work. Because, the pdf file I had had a different pdf structure. take a look at this [link](https://stackoverflow.com/questions/11384591/parsing-a-pdf-with-no-root-object-using-pdfminer) I think that's why this code work on some pdf file but some doesn't – Kanna Kim Sep 12 '17 at 23:28
  • It would be great if you could share the problematic pdf. – Ganesh Tata Nov 14 '17 at 12:37

1 Answers1

2

In my case the .pdf I wanted to open is empty and not closed from previous python code in powershell(cmd prompt). So, when I tried to delete those files it says 'Close the file and try again'. (that was my "AHaa" moment)

So I stopped the py.exe from my Windows task manager and deleted those empty, not closed files. Then I run the same code with another files, It worked fine.. :)

darla_sud
  • 359
  • 1
  • 4
  • 15