0

I built a scraper using selenium, python3.6 scrapinghub crawlera on ubuntu 18.04 that has been running well until now. I am scraping cars.com and started a few months back, the scraper downloads images for about 60 to 100 cars per hour. It does stay on the page for a few mins before going to then next request. However, recently I noticed that this has slowed and is being caused by selenium web driver timing out due to page load time going over 600 seconds. I do have a timeout exception which handles the timeout and retires the url but its taking longer than 10 mins to load the images each time

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout
(Session info: chrome=79.0.3945.130)

After some debugging I noticed that the connection is showing the HTTPS request is not secure which is whats causing the slow connection. However, the site is secure and this was not showing before so I am not sure what has changed. I did upgrade chrome to version 79 and am under the impression that this is the cause of the issue.

Any help would be greatly appricated.

enter image description here

dcarlo56ave
  • 253
  • 5
  • 18

1 Answers1

0

Some more details about your usecase would have helped us to debug your issue in a better way. However you need to take care of a couple of things as follows:

  • You are using chromedriver=2.41
  • Release Notes of chromedriver=2.41 clearly mentions the following :

Supports Chrome v67-69

  • Presumably you are using chrome=79.0
  • Release Notes of ChromeDriver v79.0 clearly mentions the following :

Supports Chrome version 79

  • Your Selenium Client version is unknown to us.

So there is a clear mismatch between ChromeDriver v2.41 and the Chrome Browser v79.0


Solution

Ensure that:


tl; dr

However, there are a couple of other measures which you can incorporate to speed up the execution and you can find a couple of relevant detailed discussion in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Hello DenbanjanB, I have updated chrome driver and it is now using the correct version. What I am noticing is that when I call get(url) the website is saying your connection is not private so I have used chrome_options.add_argument('--ignore-certificate-errors') to by pass this and I believe this is the cause of the issue. – dcarlo56ave Jan 24 '20 at 21:45