1

I need to parse a remote pdf file. With PyPDF2, it can be done by PdfReader(f), where f=urllib.request.urlopen("some-url").read() . f cannot be used by the PdfReader, and it seems that f has to be decoded. What argument should be used in decode(), or some other method has to be used.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Tom Liu
  • 59
  • 1
  • 4

1 Answers1

0

You need to use:

f = urllib.request.urlopen("some-url").read()

Add these lines after above line:

from StringIO import StringIO

f = StringIO(f)

and then read using PdfReader as:

reader = PdfReader(f)

Also, refer: Opening pdf urls with pyPdf

Community
  • 1
  • 1
Nitin Bhojwani
  • 702
  • 1
  • 5
  • 14