There are several answers on SO that address your question, but they do not seem to work with the map
function where the main process is blocked waiting for all the submitted tasks to complete. This may not be an ideal solution, but it does work:
- Issue a call to
signal.signal(signal.SIGINT, signal.SIG_IGN)
in each process in your process pool to ignore the interrupt entirely and leave the handling to the main process and
- Use instead method
Pool.imap
(or Pool.imap_unordered
) instead of Pool.map
which lazily evaluates your iterable argument for submitting tasks and processing results. In this way it (a) does not block waiting for all the results and (b) you save memory in not having to create an actual list for value_n_list
and instead use a generator expression.
- Have the main process issue print statements periodically and frequently, for example reporting on the progress of the submitted tasks being completed. This is required for the keyboard interrupt to be perceived. In the code below a
tqdm
progress bar is being used but you can simply print a completion count every N task completions where N is selected so that you do not have to wait too long for the interrupt to take effect after Ctrl-c has been entered:
from multiprocessing import Pool
import signal
import tqdm
def init_pool():
signal.signal(signal.SIGINT, signal.SIG_IGN)
def process_number(number: int):
import time
# processes the number
time.sleep(.001)
if __name__ == '__main__':
control = 1
list_size = 100000
# No reason to create the pool over and over again:
with Pool(initializer=init_pool) as p:
try:
with tqdm.trange(list_size) as progress_bar:
while True:
#value_n_list = (n for n in range(control, control + list_size))
value_n_list = range(control, control + list_size)
progress_bar.reset()
result = []
# The iterable returned by `imap` must be iterated.
# If you don't care about the value, don't store it away and use `imap_unordered` instead:
for return_value in p.imap(process_number, value_n_list):
progress_bar.update(1)
result.append(return_value)
control += list_size
except KeyboardInterrupt:
print('Ctrl-c entered.')
Update
You did not specify which platform you were running under (you should always tag your question with the platform when you tag a question with multiprocessing
), but I assumed it was Windows. If , however, you are running under Linux, the following simpler solution should work:
from multiprocessing import Pool
import signal
def init_pool():
signal.signal(signal.SIGINT, signal.SIG_IGN)
def process_number(number: int):
import time
# processes the number
time.sleep(.001)
if __name__ == '__main__':
control = 1
list_size = 100000
# No reason to create the pool over and over again:
with Pool(initializer=init_pool) as p:
try:
while True:
value_n_list = [n for n in range(control, control + list_size)]
result = p.map(process_number, value_n_list)
control += list_size
except KeyboardInterrupt:
print('Ctrl-c entered.')
See Keyboard Interrupts with python's multiprocessing Pool
Update
If that is all your "worker" function, process_number
is doing (squaring a number), your performance will suffer from using multiprocessing. The overhead from (1) Creating and destroying the process pools (and thus the processes) and (2) writing and reading to arguments and return values from address space to another (using queues). The following benchmarks this:
Function non-multiprocessing
performs 10 iterations (rather than an infinite loop for obvious reasons) of looping 10,000 times calling process_number
and saving all the return values in result
.
Function multiprocessing_1
uses multiprocessing to perform the above but only creates the pool once (8 logical cores, 4 physical cores).
Function multiprocessing_2
re-creates the pool for each of the 10 iterations.
Function multiprocessing_3
is included as a "sanity check" and is identical to multiprocessing_1
except it has the Linux Ctrl-c checking code.
The timings of each is printed out.
from multiprocessing import Pool
import time
import signal
def init_pool():
signal.signal(signal.SIGINT, signal.SIG_IGN)
def process_number(number: int):
# processes the number
return number * number
N_TRIALS = 10
list_size = 100_000
def non_multiprocessing():
t = time.time()
control = 1
for _ in range(N_TRIALS):
result = [process_number(n) for n in range(control, control + list_size)]
print(control, result[0], result[-1])
control += list_size
return time.time() - t
def multiprocessing_1():
t = time.time()
# No reason to create the pool over and over again:
with Pool() as p:
control = 1
for _ in range(N_TRIALS):
value_n_list = [n for n in range(control, control + list_size)]
result = p.map(process_number, value_n_list)
print(control, result[0], result[-1])
control += list_size
return time.time() - t
def multiprocessing_2():
t = time.time()
control = 1
for _ in range(N_TRIALS):
# Create the pool over and over again:
with Pool() as p:
value_n_list = [n for n in range(control, control + list_size)]
result = p.map(process_number, value_n_list)
print(control, result[0], result[-1])
control += list_size
return time.time() - t
def init_pool():
signal.signal(signal.SIGINT, signal.SIG_IGN)
def multiprocessing_3():
t = time.time()
# No reason to create the pool over and over again:
with Pool(initializer=init_pool) as p:
try:
control = 1
for _ in range(N_TRIALS):
value_n_list = [n for n in range(control, control + list_size)]
result = p.map(process_number, value_n_list)
print(control, result[0], result[-1])
control += list_size
except KeyboardInterrupt:
print('Ctrl-c entered.')
return time.time() - t
if __name__ == '__main__':
print('non_multiprocessing:', non_multiprocessing(), end='\n\n')
print('multiprocessing_1:', multiprocessing_1(), end='\n\n')
print('multiprocessing_2:', multiprocessing_2(), end='\n\n')
print('multiprocessing_3:', multiprocessing_3(), end='\n\n')
Prints:
1 1 10000000000
100001 10000200001 40000000000
200001 40000400001 90000000000
300001 90000600001 160000000000
400001 160000800001 250000000000
500001 250001000001 360000000000
600001 360001200001 490000000000
700001 490001400001 640000000000
800001 640001600001 810000000000
900001 810001800001 1000000000000
non_multiprocessing: 0.11899852752685547
1 1 10000000000
100001 10000200001 40000000000
200001 40000400001 90000000000
300001 90000600001 160000000000
400001 160000800001 250000000000
500001 250001000001 360000000000
600001 360001200001 490000000000
700001 490001400001 640000000000
800001 640001600001 810000000000
900001 810001800001 1000000000000
multiprocessing_1: 0.48778581619262695
1 1 10000000000
100001 10000200001 40000000000
200001 40000400001 90000000000
300001 90000600001 160000000000
400001 160000800001 250000000000
500001 250001000001 360000000000
600001 360001200001 490000000000
700001 490001400001 640000000000
800001 640001600001 810000000000
900001 810001800001 1000000000000
multiprocessing_2: 2.4370007514953613
1 1 10000000000
100001 10000200001 40000000000
200001 40000400001 90000000000
300001 90000600001 160000000000
400001 160000800001 250000000000
500001 250001000001 360000000000
600001 360001200001 490000000000
700001 490001400001 640000000000
800001 640001600001 810000000000
900001 810001800001 1000000000000
multiprocessing_3: 0.4850032329559326
Even with creating the pool once, multiprocessing took approximately 4 times longer than a straight non-multiprocessing implementation. But it runs approximately 5 times faster than the version that re-creates the pool for each of the 10 iterations. As expected, the running time of multiprocessing_3
is essentially identical to the running time for multiprocessing_1
, i.e. the Ctrl-c code has no effect on the running behavior.
Conclusions
- The Linux Ctrl-c code should have no significant effect on the running behavior of the program.
- Moving the pool-creation code outside the loop should greatly reduce the running time of the program. As to what effect, however, it should have on CPU-utilization, I cannot venture a guess.
- Your problem is not a suitable candidate as is for multiprocessing.