I have a script that collects data from a database, filters and puts into list for further processing. I've split entries in the database between several processes to make the filtering faster. Here's the snippet:
def get_entry(pN,q,entries_indicies):
##collecting and filtering data
q.put((address,page_text,))
print("Process %d finished!" % pN)
def main():
#getting entries
data = []
procs = []
for i in range(MAX_PROCESSES):
q = Queue()
p = Process(target=get_entry,args=(i,q,entries_indicies[i::MAX_PROCESSES],))
procs += [(p,q,)]
p.start()
for i in procs:
i[0].join()
while not i[1].empty():
#process returns a tuple (address,full data,)
data += [i[1].get()]
print("Finished processing database!")
#More tasks
#................
I've run it on Linux (Ubuntu 14.04) and it went totally fine. The problems start when I run it on Windows 7. The script gets stuck on i[0].join()
for 11th process out of 16 (which looks totally random to me). No error messages, nothing, just freezes there. At the same time, the print("Process %d finished!" % pN)
is displayed for all processes, which means they all come to an end, so there should be no problems with the code of get_entry
I tried to comment the q.put
line in the process function, and it all went through fine (well, of course, data
ended up empty).
Does it mean that Queue
here is to blame? Why does it make join()
stuck? Is it because of internal Lock
within Queue
? And if so, and if Queue
renders my script unusable on Windows, is there some other way to pass data collected by processes to data
list in the main process?