0

I am having some really hard times trying to figure out how to webscrape making multiple requests to the same website. I have to web scrape 3000 products from a website. That implies making various requests to that server (for example searching the product, clicking on it, going back to the home page) 3000 times. I state that I am using Selenium. If I only launch one instance of my Firefox webdriver I don't get a MaxRetryError, but as the search goes on my webdriver gets slower and slower, and when the program reaches about half of the searches it stops responding. I looked it up on some forums and I found out it does so for some browser memory issues. So I tried quitting and reinstantiating the webdriver every n seconds (I tried with 100, 200 and 300 secs), but when I do so I get that MaxRetryError because of the too many requests to that url using the same session. I then tried making the program sleep for a minute when the exception occurs but that hasn't worked (I am only able to make another search and then an exception is again thrown, and so on). I am wondering if there is any workaround for these kind of issue. It might be using another library, a way for changing IP or session dynamically or something like that. P.S. I would rather keep working with selenium if possible.

giulio di zio
  • 171
  • 1
  • 11

1 Answers1

0

This error is normally raised if the server determines a high request rate from your client.

As you mentioned, the server bans your IP from making further requests so you can get around that by using some available technologies. Look into Zalenium and also see here for some other possible ways.

Another possible (but tedious) way is to use a number of browser instances to make the call, for example, an answer from here illustrates that.

urlArr = ['https://link1', 'https://link2', '...']

for url in urlArr:
   chrome_options = Options()  
   chromedriver = webdriver.Chrome(executable_path='C:/Users/andre/Downloads/chromedriver_win32/chromedriver.exe', options=chrome_options)
   with chromedriver as browser:
      browser.get(url)
      # your task
      chromedriver.close() # will close only the current chrome window.

browser.quit() # should close all of the open windows,
AzyCrw4282
  • 7,222
  • 5
  • 19
  • 35