3

I've recently started poking around at the multiprocessing module and find the pool.map function very useful for parsing a large array very quickly. Is there a way to terminate a pool early however? Lets say I have a huge list and I want to find a number in a list, check if it's devisable by x, and then return true if it is and terminate the rest of the pool early, how might I go about doing this? For a proof of concept, I'm trying to find prime numbers from 3 to infinity (the least efficient way possible). Here's an example:

import multiprocessing
from functools import partial

finders=multiprocessing.pool(multiprocessing.cpu_count()-1)

def is_devis(x, number):
    if number%x==0:
        return True

if __name__=="__main__":
    Primes=[3, 5, 7, 11, 13, 17, ...]
    x=3
    while True:
        x=x+2
        func=partial(is_devis, x)
        results=finders.map(func, Primes)
        if not (True in results):
            Primes.append(x)

I might not completely grasp how multiprocess pools or the pool.map function works but from what I understand, it will split an iterable up evenly for you and then spread them out amongst the pool and the workers will continue until all the processes return or finish. Is there a way to terminate a pool as soon as one process returns a value? I have looked at the documentation on multiprocess.pool but it is noted

Worker processes within a Pool typically live for the complete duration of the Pool’s work queue.

Thanks in advance!

BobserLuck
  • 401
  • 1
  • 3
  • 13
  • When building application using multiprocessing, the actual application is relevant. If I take the example with the primes, I would do it like that: Give a max number which will be the last one to be tested. Create a function which take a number and tells you if its a prime. Map this function on a range up to the max number. Thus the end criterion is not "a process returns a value" but the maximum number preset. If you want advice on how to build your multiprocessing application, please give an explanation about what you are trying to achieve. – Mathieu Nov 03 '18 at 19:23
  • @Mathieu Thanks for the input. I don't currently have a specific project in mind other than that example. I'm actually working on that as a project, just a way to find as many prime numbers as possible and observe how the computer handles it. I would also implement a way to save them all to a file. But the question still remains the same unfortunately. Would it be possible to terminate a pool based on the result of a process mid way through the pool? – BobserLuck Nov 03 '18 at 19:53
  • 1
    Also: https://stackoverflow.com/questions/37691552/how-to-get-all-pool-apply-async-processes-to-stop-once-any-one-process-has-found/37700081#37700081 – noxdafox Nov 03 '18 at 22:11
  • @noxdafox Yes, that's actually exactly what I was looking for. With a little tweaking it has fit the job perfectly. Not sure why I wasn't able to find that answer in my searches before but thanks! – BobserLuck Nov 05 '18 at 06:13

1 Answers1

0

A naive approach would be to have a global flag that a thread can set when it finds an answer. In each of the other threads, you can periodically check the flag and have the thread terminate if it is set.

Steven Morad
  • 2,511
  • 3
  • 19
  • 25
  • Note that this requires that the separate processes *share* the flag, which means putting it into a Manager object or using the shared-memory primitives. – torek Nov 03 '18 at 19:58
  • I was thinking of something of that sort. Unfortunately, when a new process is made, from my understanding, when a new process is created, it is creating an entirely new python instance witch doesn't share any global variables. As @torek was mentioning, you can get around this with a specific type of shared memory variable. I have tried using this approach but because pool().map only accepts a iterable object, using the partial function throws an error. "RuntimeError: Synchronized objects should only be shared between processes through inheritance" – BobserLuck Nov 03 '18 at 20:13
  • 1
    @BobserLuck: Linux `multiprocess` uses `fork()` so that children inherit (copies of) parent settings, including Manager objects. Manager objects detect the fork and connect on a communications channel to share updates. The shared memory primitives are simpler internally and thus more efficient, but harder to use. I think Managers can be used on Windows but I'm not sure of the details. – torek Nov 04 '18 at 00:12