1

I am using a multiprocessing Pool to parallelize some expensive computations. Let's say some of these computations rely on external data, such as files on the hard drive, network connections or subprocesses running scripts in other programming languages.

I can initialize those resources using the initializer from multiprocessing.Pool. However, there is no shutdown function to properly release the resources.

Lifting the example from https://stackoverflow.com/a/28508998:

socket = None
def init(address, port):
    global socket
    socket = magic(address, port)

def job(data):
    global socket
    assert socket is not None
    return send(socket, data)

pool = multithreading.Pool(N, init, [address, port])
pool.map(job, ['foo', 'bar', 'baz'])

In this example, how can I make sure the socket is closed properly? Or, in a more general sense, perform some closing code on each worker?

This answer https://stackoverflow.com/a/13136120/2375130 suggests the following workaround (adapted for this example):

import time

def destroy(x):
    global socket
    socket.close()
    time.sleep(1) # to ensure that this worker does not run on two elements from range
    return None

pool.map_async(g, range(N), 1)

This seems like a bit of a hack. Is there a better option that does not involve reimplementing the Pool myself?

Scaatis
  • 137
  • 2
  • 10

0 Answers0