I'm querying an API that may return incomplete results. For each complete result, I want to start a new process.
Every few seconds I want to query again the API and check if I got any new results. If so, I should start a new process (while the previous one is still running) and so on. From the 1st query to the API, I know the number of results I should expect (that will be equal to the number of processes I want to run).
Here is some code I'm experimenting with:
from bs4 import BeautifulSoup
import urllib
import time
from multiprocessing import Process
def someFunction(task):
timeout = time.time() + 10*120 # 120 minutes from now
while True:
time.sleep(2)
#do something
if time.time() > timeout:
break
if __name__=='__main__':
processes_started = []
tasks = [1] #Just initialize so that the 'while' loop can start. It'll then change value
while len(processes_started)<len(tasks):
r = urllib.urlopen(URL).read()
soup = BeautifulSoup(r, "lxml")
#'Tasks' will be of the correct length from the 1st call but may not contain all the data needed e.g. 'task.description'
tasks = soup.find_all("task")
for task in tasks:
if task.description not in processes_started:
processes_started.append(task.description)
p = Process(target = someFunction, args=(task,))
p.start()
p.join()
time.sleep(2)
However, the code above just waits for each process to finish and the start a new one if possible. What am I doing wrong?