2

Program Logic

I'm opening multiple selenium threads from the list using multithreading library in python3. These threads are stored in an array from which they're started like this:

for each_thread in browser_threads:
    each_thread.start()
for each_thread in browser_threads:
    each_thread.join()

Each thread calls a function to start the selenium firefox browser. Function is as follows..

Browser Function

# proxy browser session
def proxy_browser(proxy):
    global arg_pb_timesec
    global arg_proxyurl
    global arg_youtubevideo
    global arg_browsermode

    # recheck proxyurl
    if arg_proxyurl == '':
        arg_proxyurl = 'https://www.duckduckgo.com/'
    # apply proxy to firefox using desired capabilities
    PROX = proxy
    webdriver.DesiredCapabilities.FIREFOX['proxy']={
        "httpProxy":PROX,
        "ftpProxy":PROX,
        "sslProxy":PROX,
        "proxyType":"MANUAL"
    }

    options = Options()
    # for browser mode
    options.headless = False
    if arg_browsermode == 'headless':
        options.headless = True
    driver = webdriver.Firefox(options=options)
    try:
        print(f"{c_green}[URL] >> {c_blue}{arg_proxyurl}{c_white}")
        print(f"{c_green}[Proxy Used] >> {c_blue}{proxy}{c_white}")
        print(f"{c_green}[Browser Mode] >> {c_blue}{arg_browsermode}{c_white}")
        print(f"{c_green}[TimeSec] >> {c_blue}{arg_pb_timesec}{c_white}\n\n")

        driver.get(arg_proxyurl)
        time.sleep(2) # seconds
        # check if redirected to google captcha (for quitting abused proxies)
        if not "google.com/sorry/" in driver.current_url:
            # if youtube view mode
            if arg_youtubevideo:
                delay_time = 5 # seconds
                # if delay time is more than timesec for proxybrowser
                if delay_time > arg_pb_timesec:
                    # increase proxybrowser timesec
                    arg_pb_timesec += 5
                    # wait for the web element to load
                    try:
                        player_elem = WebDriverWait(driver, delay_time).until(EC.presence_of_element_located((By.ID, 'movie_player')))
                        togglebtn_elem = WebDriverWait(driver, delay_time).until(EC.presence_of_element_located((By.ID, 'toggleButton')))
                        time.sleep(2)
                        # click player
                        webdriver.ActionChains(driver).move_to_element(player_elem).click(player_elem).perform()
                        try:
                            # click autoplay button to disable autoplay
                            webdriver.ActionChains(driver).move_to_element(togglebtn_elem).click(togglebtn_elem).perform()
                        except Exception:
                            pass
                    except TimeoutException:
                        print("Loading video control taking too much time!")
        else:
            print(f"{c_red}[Network Error] >> Abused Proxy: {proxy}{c_white}")
            driver.close()
            driver.quit()
            #if proxy not in abused_proxies:
            #   abused_proxies.append(proxy)
    except Exception as e:
        print(f"{c_red}{e}{c_white}")
        driver.close()
        driver.quit()

What the above does is start the browser with a proxy, check if the redirected url is not google recaptcha to avoid sticking on abused proxies page, if youtube video argument is passed, then wait for movie player to load and click it to autoplay.

Sort of like a viewbot for websites as well as youtube.

Problem

The threads indicate to end, but they keep running in the background. The browser window never quits and scripts exists with all browser threads runnning forever!

I tried every Stackoverflow solution and various methods, but nothing works. Here is the only relevant SO question which is also not so relevant since OP is spawing os.system processes, which I'm not: python daemon thread exits but process still run in the background

EDIT: Even when the whole page is loaded, youtube clicker does not work and there is no exception. The threads indicate to stop after network error, but there is no error?!

Entire Script

As suggested by previous stackoverflow programmers, I kept code here minimal and reproducable. But if you need the entire logic it's here: https://github.com/ProHackTech/FreshProxies/blob/master/fp.py

Here is the screenshot of what is happening:

SCREENSHOT

Andrei Suvorkov
  • 5,559
  • 5
  • 22
  • 48
newbieCoder
  • 131
  • 8
  • @DebanjanB Yes this is the same issue I think. I tried a lot to make it thread safe by handling exceptions and even adding both driver.close() and driver.quit(). Idk what else can be done now. – newbieCoder Aug 15 '19 at 02:40
  • I think I will try out multiprocessing instead. Or maybe look into Asyncio – newbieCoder Aug 15 '19 at 02:40

1 Answers1

1

As you are starting multiple threads and joining them as follows:

for each_thread in browser_threads:
    each_thread.start()
for each_thread in browser_threads:
    each_thread.join()

At this point, it is worth to note that WebDriver is not thread-safe. Having said that, if you can serialise access to the underlying driver instance, you can share a reference in more than one thread. This is not advisable. But you can always instantiate one WebDriver instance for each thread.

Ideally the issue of thread-safety isn't in your code but in the actual browser bindings. They all assume there will only be one command at a time (e.g. like a real user). But on the other hand you can always instantiate one WebDriver instance for each thread which will launch multiple browsing tabs/windows. Till this point it seems your program is perfect.

Now, different threads can be run on same Webdriver, but then the results of the tests would not be what you expect. The reason behind is, when you use multi-threading to run different tests on different tabs/windows a little bit of thread safety coding is required or else the actions you will perform like click() or send_keys() will go to the opened tab/window that is currently having the focus regardless of the thread you expect to be running. Which essentially means all the test will run simultaneously on the same tab/window that has focus but not on the intended tab/window.


Reference

You can find a relevant detailed discussion in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352