1

I have a python application which is going to create a process for each element of the given inputs colleftion. The inputs is a collection of about 8 elements. And the application reads a topic to get 8 elements periodically.

For each element of the input, I create a new process and pass in the input to a function.

The function is CPU bound in nature, it performs numerical operations.

My application has a Configuration object which is a dictionary. I load the data in the configuration at the time of loading the main process and then create a pool with 8 worker subpools.

What is the right mechanism to pass the configuration object in each of the process? I don't want to increase the memory footprint of the process.

As an example:

# cpu intensive operation
def cpu_bound(input):
    ...  # complex cpu bound op
    # I want to use config here

    return output


def get_config():
    # create configuration object
    config = {
        "version": 1,
        "disable_existing_loggers": False,
        "loggers": {
            "": {
                "level": "INFO"
            }, 
            "another.module": {
                "level": "DEBUG"
            }
        }
    }


def pool_handler(inputs):
    p = Pool(8)  # 8 core machine
    results = p.map(cpu_bound, inputs)
    return results


if __name__ == "__main__":

    config = get_config()
    # get inputs from a topic
    inputs = get_inputs()
    results = pool_handler(inputs)

Question What is the recommended approach to use the configuration within each process? The configuration is read-only in nature as I only need to load it once at boot up of the application. There are multiple ways but what is the recommended approach for this scenario?

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
InfoLearner
  • 14,952
  • 20
  • 76
  • 124
  • What exactly do you mean by read-only? Is there any particular problem with what you have now? – mkrieger1 Jun 06 '20 at 16:13
  • I should have mentioned. Thanks. Read-only as in, I only need to create it once for my application before I query the topic. My question is what is the best practice to share data? – InfoLearner Jun 06 '20 at 16:15
  • 3
    I don’t know, just pass it as an argument? – mkrieger1 Jun 06 '20 at 16:16
  • If you don't know then let others answer – InfoLearner Jun 06 '20 at 16:16
  • 1
    @InfoLearner: If `cpu_bound` needs additional information pass it to the function. In case you would run out of memory, thats not because of that little configuration object. – Maurice Meyer Jun 06 '20 at 16:22
  • Thanks. I want to set up a pattern for the team. The configuration object will grow with api objects and other entities. I have noticed that the memory footprint increases when I pass it as an argument to the method. Is there a way to eliminate it so that I dont copy the object around? Should I use Value, Array, Queue etc? – InfoLearner Jun 06 '20 at 17:27
  • https://stackoverflow.com/a/38135787/594589 – dm03514 Jun 06 '20 at 18:45
  • Thanks for repening it. It was the right and smart decision – InfoLearner Jun 08 '20 at 19:26

1 Answers1

0

The correct way to share static information within multiprocessing.Pool consist in using the initializer function to set it via its initargs.

The two above variables are in fact passed to the Pool workers as Process constructor parameters thus following the recommendations of the multiprocessing programming guidelines.

Explicitly pass resources to child processes

On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process.

variable = None


def initializer(*initargs):
    """The initializer function is executed on each worker process
    once they start.

    """
    global variable

    variable = initargs


def function(*args):
    """The function is executed on each parameter of `map`."""
    print(variable)


with multiprocessing.Pool(initializer=initializer, initargs=[1, 2, 3]) as pool:
    pool.map(function, (1, 2, 3))
noxdafox
  • 14,439
  • 4
  • 33
  • 45