First I will say that I am not very knowledgeable in any language. I took a few classes in college and never really followed up. So bear with me.
I am using python 2.7 on windows 7.
So I'm trying to write something basic to scrape, parse, and analyze data from a particular website. I got as far as realizing I should get be using requests, BeautifulSoup, and lxml.
It is a secured website. The website uses TLS 1.0. The connection is encrypted using AES_256_CBC with SHA 1 for message authentication, and RSA as the key exchange mechanism. None of this really means anything to me. Any of this prohibitive?
The code thats giving me problems:
from BeautifulSoup import BeautifulSoup
import requests
results = request.get(url)
The trace this gives is as follows:
Traceback (most recent call last):
File "C:/Users/User/PycharmProjects/untitled/jkh.py", line 6, in <module>
results = requests.get(url)
File "C:\Python27\lib\site-packages\requests\api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests\adapters.py", line 431, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:590)
I tried adding a verify=false parameter which didn't work. That resulted in the same trace.
When I try to connect to the site using openSSL and forcing tls1 as described here I get:
Verify return code: 20 (unable to get local issuer certificate)
This hangs up for a few minutes before giving me:
read:errno 10054
error in s_client
This code works but I'm not totally clear on what Sessions is and how I'd move forward from here. Credit to stackflow user jasonamyers
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
import ssl
class MyAdapter(HTTPAdapter):
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(num_pools=connections,
maxsize=maxsize,
block=block,
ssl_version=ssl.PROTOCOL_TLSv1)
import requests
s = requests.Session()
s.mount('https://', MyAdapter())
Help?
EDIT: Looks like I had everything I needed. This code works:
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
import ssl
class MyAdapter(HTTPAdapter):
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(num_pools=connections,
maxsize=maxsize,
block=block,
ssl_version=ssl.PROTOCOL_TLSv1)
import requests
s = requests.Session()
s.mount('url', MyAdapter())
results = s.get(url)