I need to parse a remote pdf file. With PyPDF2, it can be done by PdfReader(f)
, where f=urllib.request.urlopen("some-url").read() . f cannot be used by the PdfReader, and it seems that f has to be decoded. What argument should be used in decode(), or some other method has to be used.
Asked
Active
Viewed 2,481 times
1

Martin Thoma
- 124,992
- 159
- 614
- 958

Tom Liu
- 59
- 1
- 4
1 Answers
0
You need to use:
f = urllib.request.urlopen("some-url").read()
Add these lines after above line:
from StringIO import StringIO
f = StringIO(f)
and then read using PdfReader as:
reader = PdfReader(f)
Also, refer: Opening pdf urls with pyPdf

Community
- 1
- 1

Nitin Bhojwani
- 702
- 1
- 5
- 14