Python-How to parse online pdf file while the url doesn't have .pdf extension

Asked Sep 10 '18 at 08:24

Active Sep 10 '18 at 11:24

Viewed 82 times

I'm trying to extract data from an online pdf files. I tried to implement this code to this url but I got urlopen error. I noticed that there is not any .pdf extension. Any suggestion?

Error

Traceback (most recent call last):
  File "C:/Users/Danial/Desktop/pdf.py", line 7, in <module>
    op = urllib2.urlopen(Request(url)).read()
  File "C:\Python27\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 431, in open
    response = self._open(req, data)
  File "C:\Python27\lib\urllib2.py", line 449, in _open
    '_open', req)
  File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 1240, in https_open
    context=self._context)
  File "C:\Python27\lib\urllib2.py", line 1197, in do_open
    raise URLError(err)
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)>

Code

import urllib2
from urllib2 import Request
from StringIO import StringIO

url = 'https://nycprop.nyc.gov/nycproperty/StatementSearch?bbl=3068690056&stmtDate=20180824&stmtType=SOA'

op = urllib2.urlopen(Request(url)).read()
memoryFile = StringIO(op)

parser = PDFParser(memoryFile)

edited Sep 10 '18 at 11:24

asked Sep 10 '18 at 08:24

Muhammad Danial

Post your error too ! – Sushant Sep 10 '18 at 08:25
2

Download the file and extract the data locally :) – Jones1220 Sep 10 '18 at 08:25
@Jones1220 any other way except downloading? – Muhammad Danial Sep 10 '18 at 08:45
@ThatBird error and code has been added – Muhammad Danial Sep 10 '18 at 11:24
Your problem has absolutely nothing to do with "pdf extension" (which FWIW is just an bunch of letters and have nothing to do with the actual file type). Reading the error message and googling for it would have saved on everyone's time. – bruno desthuilliers Sep 10 '18 at 12:10

Python-How to parse online pdf file while the url doesn't have .pdf extension

0 Answers0