ua = UserAgent()
headers={
'user-agent':str(ua.random),
'Connection':'close'
}
r = requests.get(url,headers=headers,timeout=5)
I want to scrape some information from a website ,but the function request.get()
raise exception occasionally (sometimes successful but sometime not). I've tried many methods, random u-a, timeout, time.sleep, max tries, but of no use.
Is there something wrong with my code, or is it a fault or some anti-scraper system of the website?
Here is the full exception:
Traceback (most recent call last):
File "d:\AAA临时文档\抢课app\爬虫\run2.py", line 7, in <module>
r=requests.get(url=url,headers=headers,timeout=20)
File "C:\Users\86153\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\86153\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\86153\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\86153\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "C:\Users\86153\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\adapters.py", line 504, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='www.dy2018.com', port=443): Max retries exceeded with url: /i/103887.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x04046A18>, 'Connection to www.dy2018.com timed out. (connect timeout=20)'))