1

I am trying to create a method that will create threads and send them into a thread pool. How do I stop individual threads after creating them? edit: This is being used for webscraping and will need to be ran in the background for days, it will be a dynamic number of processes and a number of other tasks(I only added 1 for reference. I also do not want the process to end upon completion (will loop the task) only to end upon user request

def Target(web,delay):
    log = ("starting")
    # gives headless option to chromedriver
    op = webdriver.ChromeOptions()
    op.add_argument('headless')
    driver = webdriver.Chrome(options=op)
    # launches driver with desired webpage
    driver.get(web)
    log = ("getting webpage")
    while [False != True]:
       try:
        #test to check if on correct page
            #looking for matching key
           log = ("checking stock")
           elem = driver.find_element_by_xpath('//*[@id="viewport"]/div[5]/div/div[2]/div[3]/div[1]/div/div[3]/div[1]/div[2]/button')
           if elem.is_displayed():
               log = ("instock")
               title= driver.title
               url= driver.current_url
               return (title, url)
       except NoSuchElementException:
           print("product is not in stock.... trying again")
           #retry delay
           time.sleep(float(delay))
           driver.get(web)

def multimethodv2(MethodToRun, url, delay,id):
    if __name__ == "__main__":
        pool = ThreadPoolExecutor()
        pool.submit(Target,url,delay)
  • `if __name__ == "__main__"` inside a function is not very idiomatic – Tomerikoo Oct 26 '20 at 17:20
  • If you are only running one task, there is no point here to using a thread pool. If you are running multiple tasks, then see [Python selenium multiprocessing](https://stackoverflow.com/questions/53475578/python-selenium-multiprocessing) for an idea on how to initialize the threads so that you are not re-creating the driver over and over again. In particular, see [my refinement to this](https://stackoverflow.com/questions/53475578/python-selenium-multiprocessing/64513719#64513719), which modifies the accepted answer to ensure that the driver processes are terminated when you are done. – Booboo Oct 26 '20 at 18:28
  • Right now, you have no call to `driver.quit()` at all, and that is not a great thing. – Booboo Oct 26 '20 at 18:30
  • This is being used for webscraping and will need to be ran in the background for days, it will be a dynamic number of processes and a number of other tasks(I only added 1 for reference) – Michal Pisarek Oct 26 '20 at 18:34

2 Answers2

0

I always add a variable called running and implement it as follows:

from threading import Thread

def do_stuff():
    while running:
        ...
        if running:
            ...
        else:
            break

        # let's say one thread found a solution:
        running = False

threads = 4
running = True

for i in range(threads):
    t = Thread(target=do_stuff, daemon=True)
    t.start()

All threads will check for the running variable and if one thread yound a solution or the program shall be stopped, this is set to False and the threads exit their loops. Works great in most cases.

If this is not suitable for you, you should checkout threading events

See stackoverflow question for explanation: Python Threading with Event object

TheClockTwister
  • 819
  • 8
  • 21
0

You cannot, nor would you want to, stop individual threads in a thread pool. They will all terminate when a shutdown of the pool is performed, in one of two ways:

When using ThreadPoolExecutor, you call the shutdown method or you use a context manager: with ThreadPoolExecutor() as pool:

def multimethodv2(MethodToRun, url, delay,id):
    with ThreadPoolExecutor(max_workers=1) as pool:
        future = pool.submit(Target,url,delay)
        result = future.result() # wait for submitted task to end first before terminating block

Or:

pool = ThreadPoolExecutor(max_workers=1)
future = pool.submit(Target,url,delay)
result = future.result() # wait for submitted task to end first before terminating pool
pool.shutdown()

Or:

pool = ThreadPoolExecutor(max_workers=1)
future = pool.submit(Target,url,delay)
pool.shutdown(wait=False) # return immediately
result = future.result() # if this is last completed future, the pool will now shutdown

But you should still call arrange to call driver.quit() or else a driver process will be left around:

import threading
import gc
from concurrent.futures import ThreadPoolExecutor
from selenium import webdriver
# etc.


threadLocal = threading.local()


class Driver:
    def __init__(self):
        log = ("starting")
        # gives headless option to chromedriver
        op = webdriver.ChromeOptions()
        op.add_argument('headless')
        self.driver = webdriver.Chrome(options=op)

    def __del__(self):
        self.driver.quit() # clean up driver when we are cleaned up
        #print('The driver has been "quitted".')

        
def create_driver():
    the_driver = getattr(threadLocal, 'the_driver', None)
    if the_driver is None:
        the_driver = Driver()
        setattr(threadLocal, 'the_driver', the_driver)
    return the_driver.driver



def Target(web,delay):
    driver = create_driver()
    # launches driver with desired webpage
    driver.get(web)
    log = ("getting webpage")
    while [False != True]:
       try:
        #test to check if on correct page
            #looking for matching key
           log = ("checking stock")
           elem = driver.find_element_by_xpath('//*[@id="viewport"]/div[5]/div/div[2]/div[3]/div[1]/div/div[3]/div[1]/div[2]/button')
           if elem.is_displayed():
               log = ("instock")
               title= driver.title
               url= driver.current_url
               return (title, url)
       except NoSuchElementException:
           print("product is not in stock.... trying again")
           #retry delay
           time.sleep(float(delay))
           driver.get(web)


# pool is now passed a an argument:
def multimethodv2(pool, MethodToRun, url, delay,id):
    future = pool.submit(Target,url,delay)
    return_value = future.result()

        
        
if __name__ == '__main__':
    N_THREADS = 1 # Put in a more realistic value when you have a more realistic example
    with ThreadPoolExecutor(max_workers=1) as pool:
        multimethodv2(pool, MethodToRun, url, delay, id)
    threadLocal = None # clean up drivers
    gc.collect()
Booboo
  • 38,656
  • 3
  • 37
  • 60
  • Thank you for the quick reply, the main hurtle is that I am creating these treads/futures dynamically and want the user to be able to terminate the thread upon request. – Michal Pisarek Oct 26 '20 at 18:45
  • @MichalPisarek So, what is the real issue? The threads exist to process new *tasks* as they are submitted with either the `submit` or `map` method of your `ThreadPoolExecutor` instance. Also see my comment to your question concerning efficiently reusing a Chrome driver in these threads. You can avoid creating and quitting the driver for each new URL and do it once per worker thread in your pool. – Booboo Oct 26 '20 at 18:47
  • I've updated the answer with code to show how you might one Chrome driver per worker thread. – Booboo Oct 26 '20 at 19:18