Python: How to keep starting new parallel processes based on condition

Question

I'm querying an API that may return incomplete results. For each complete result, I want to start a new process.

Every few seconds I want to query again the API and check if I got any new results. If so, I should start a new process (while the previous one is still running) and so on. From the 1st query to the API, I know the number of results I should expect (that will be equal to the number of processes I want to run).

Here is some code I'm experimenting with:

from bs4 import BeautifulSoup
import urllib
import time
from multiprocessing import Process

def someFunction(task):
    timeout = time.time() + 10*120   # 120 minutes from now
    while True:
        time.sleep(2)
        #do something
        if time.time() > timeout:
            break


if __name__=='__main__':

    processes_started = []

    tasks = [1] #Just initialize so that the 'while' loop can start. It'll then change value

    while len(processes_started)<len(tasks):

        r = urllib.urlopen(URL).read()
        soup = BeautifulSoup(r, "lxml")

        #'Tasks' will be of the correct length from the 1st call but may not contain all the data needed e.g. 'task.description'
        tasks = soup.find_all("task")

        for task in tasks:
            if task.description not in processes_started:
                processes_started.append(task.description)

                p = Process(target = someFunction, args=(task,))
                p.start()
                p.join()

        time.sleep(2)

However, the code above just waits for each process to finish and the start a new one if possible. What am I doing wrong?

See http://stackoverflow.com/a/14430044/4080476 ... Join is blocking until completion — Brian Pendleton, Oct 21 '16 at 11:14

Python: How to keep starting new parallel processes based on condition

0 Answers0