0

I built a scraper using selenium that only works when it starts a new chrome window and makes a referred request. It does not work in headless mode, I have to actually see a new chrome a window open, navigate to the site, and close every time. It works fine but is a bit slow. Is there a way to run the scraper in parallel multiple times? Maybe using multiple remote OS opening chrome? Is there software that helps me do that?

options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--proxy-server={"........"}')
options.add_argument('window-size=500x250')
options.add_experimental_option('useAutomationExtension', False)

def interceptor(request):
    request.headers[
        'Referer'] = 'https://www.*****'
    request.headers...
    

driver = webdriver.Chrome("D:\chromedriver\94\chromedriver.exe", options=options)
driver.request_interceptor = interceptor

listurl = ["www...", "www..."]

for i in range(len(listurl):
    try:
        driver = webdriver.Chrome("D:\chromedriver\94\chromedriver.exe", options=options)
        driver.request_interceptor = interceptor

        driver.get(listurl[i])
        # save json info into a csv, ...

        time.sleep(2 * random.random())  
        driver.stop_client()
        driver.close()
        driver.quit()

Ben Hendel
  • 51
  • 4
  • You can create a thread for each driver/browser pair. Sounds like running a grid may help you too. That would allow you to run multiple machines (aka "nodes") in parallel: https://www.selenium.dev/documentation/grid/ – pcalkins Jan 31 '23 at 23:27
  • look here https://stackoverflow.com/a/72574108/8157304 – sound wave Feb 01 '23 at 07:43
  • Check out [seleniumbase](https://github.com/seleniumbase), you can run your script with multiple threads by using the ```-n=THREAD_COUNT``` flag when running your script from the CLI. – Romek Feb 01 '23 at 13:21

0 Answers0