2
ua = UserAgent()
headers={
    'user-agent':str(ua.random),
    'Connection':'close'
}

r = requests.get(url,headers=headers,timeout=5)

I want to scrape some information from a website ,but the function request.get() raise exception occasionally (sometimes successful but sometime not). I've tried many methods, random u-a, timeout, time.sleep, max tries, but of no use.

Is there something wrong with my code, or is it a fault or some anti-scraper system of the website?

Here is the full exception:

Traceback (most recent call last):
  File "d:\AAA临时文档\抢课app\爬虫\run2.py", line 7, in <module>
    r=requests.get(url=url,headers=headers,timeout=20)
  File "C:\Users\86153\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\86153\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\86153\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\86153\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\86153\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\adapters.py", line 504, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='www.dy2018.com', port=443): Max retries exceeded with url: /i/103887.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x04046A18>, 'Connection to www.dy2018.com timed out. (connect timeout=20)'))
Daniel Walker
  • 6,380
  • 5
  • 22
  • 45
zhf999
  • 23
  • 3
  • 1
    timeout is natual, just try catch. from some keywords i guess you want automatically submit some form, to register courses, and the website may be very busy now(and often university website won't be desgined to scale well). – Lei Yang Jul 08 '21 at 02:44
  • @LeiYang I dont think `www.dy2018.com` is a website of the kind you describe though... – Zebartin Jul 08 '21 at 02:57
  • yeah, looks like some piracy video. it's the python script file name looks like a scholer – Lei Yang Jul 08 '21 at 03:07
  • 1
    I cannot reproduce `ConnectTimeout` on the same url, but I think `session` and `HTTPAdapter` with `retries` may help you: https://stackoverflow.com/a/15431343/16354567 – Zebartin Jul 08 '21 at 09:45
  • Days ago I solve this problem, if I run this script in a wireless Internet environment, it runs faster and raise no exceptions.But I dont know why this will happen. – zhf999 Jul 17 '21 at 11:10

0 Answers0