10

I'd love to give an indication of the current talk in total that we are only. I'm farming work out and would like to know current progress. So if I sent 100 jobs to 10 processors, how can I show what the current number of jobs that have returned is. I can get the id's but but how do I count up the number of completed returned jobs from my map function.

I'm calling my function as the following:

op_list = pool.map(PPMDR_star, list(varg))

And in my function I can print the current name

current = multiprocessing.current_process()
print 'Running: ', current.name, current._identity
disruptive
  • 5,687
  • 15
  • 71
  • 135
  • You want to be able to check this from within each worker process? – dano Oct 07 '14 at 14:47
  • @dano - I don't mind - just something as the processes are being executed. Pool.map doesn't return until complete so by then its too late for any stats - or is there a way? – disruptive Oct 07 '14 at 14:55

1 Answers1

18

If you use pool.map_async you can pull this information out of the MapResult instance that gets returned. For example:

import multiprocessing
import time

def worker(i):
    time.sleep(i)
    return i


if __name__ == "__main__":
    pool = multiprocessing.Pool()
    result = pool.map_async(worker, range(15))
    while not result.ready():
        print("num left: {}".format(result._number_left))
        time.sleep(1)
    real_result = result.get()
    pool.close()
    pool.join()

Output:

num left: 15
num left: 14
num left: 13
num left: 12
num left: 11
num left: 10
num left: 9
num left: 9
num left: 8
num left: 8
num left: 7
num left: 7
num left: 6
num left: 6
num left: 6
num left: 5
num left: 5
num left: 5
num left: 4
num left: 4
num left: 4
num left: 3
num left: 3
num left: 3
num left: 2
num left: 2
num left: 2
num left: 2
num left: 1
num left: 1
num left: 1
num left: 1

multiprocessing internally breaks the iterable you pass to map into chunks, and passes each chunk to the children processes. So, the _number_left attribute really keeps track of the number of chunks remaining, not the individual elements in the iterable. Keep that in mind if you see odd looking numbers when you use large iterables. It uses chunking to improve IPC performance, but if seeing an accurate tally of completed results is more important to you than the added performance, you can use the chunksize=1 keyword argumment to map_async to make _num_left more accurate. (The chunksize usually only makes a noticable performance difference for very large iterables. Try it for yourself to see if it really matters with your usecase).

As you mentioned in the comments, because pool.map is blocking, you can't really get this unless you were to start a background thread that did the polling while the main thread blocked in the map call, but I'm not sure there's any benefit to doing that over the above approach.

The other thing to keep in mind is that you're using an internal attribute of MapResult, so it's possible that this could break in future versions of Python.

carlodef
  • 2,402
  • 3
  • 16
  • 21
dano
  • 91,354
  • 19
  • 222
  • 219
  • 2
    Thanks. I tried just running with map_async, but got an issue with: 'MapResult' object is not iterable – disruptive Oct 07 '14 at 15:05
  • 2
    @Navonod I've updated my answer. You need to call `result.get()` on the `MapResult` instance to get the actual list of results. – dano Oct 07 '14 at 15:07
  • I noticed one issue with what seems to be a lot of jobs to map over - in that my numbers are incorrect. I sent 8k jobs over - yeah a lot of file, but I get weird reporting. I.e: Found # files 8067 Number of files left to process: 253 – disruptive Oct 07 '14 at 15:28
  • 1
    @Navonod see the paragraph immediately after the example output. You see that because of the chunking that `multiprocessing` does internally. Use `map_async(func, iterable, chunksize=1)` and you should see the number you expect. – dano Oct 07 '14 at 15:29
  • sorry was hiding there. Works fine now! Thanks! – disruptive Oct 07 '14 at 15:38
  • To get number of tasks left (not chunks) one can do: `result._value.count(None)` in place of `result._number_left` – Andrey Sep 08 '20 at 00:47