Python multiprocessing within class passing class objects as arguments

Question

I have a Scheduler class which contains a list of Client objects, all with their own Pytorch models, parameters and training functions. I am trying to train multiple clients in parallel as I have multiple GPUs and each Client is assigned a GPU.

The basic code structure is like this:

import torch.multiprocessing as mp

class Scheduler:
    def __init__(self, num_clients):
        self.clients = [] # Client1, ..., ClientN

    def client_update(self, client):
        print("Client {}".format(client.id))
        client.train()
        client.evaluate(self.dataset.test_dataloader)

    def train(self, num_rounds):
        for round in range(num_rounds):
            processes = []

            for client in self.clients:
                process = mp.Process(target=self.client_update, args=(client, ))
                process.start()
                processes.append(process)

            for process in processes:
                process.join()

The Scheduler class is initialised in the main script and the train function is called there. Within the if guard I set mp.set_start_method('spawn', force=True).

This method doesn't seem to work as, the Process creates a new Client object and I run into an EOFError: Ran out of input error, similar to this. Unfortunately I cannot use the same solution as in this thread.

Tried using a Pool method but couldn't get that working unfortunately.

    ctx = mp.get_context('forkserver')
    pool = ctx.Pool(2)
    pool.map(
        functools.partial(self.client_update,),
        self.clients)
    pool.close()

I am unsure what the best method would be to be able to use the GPUs efficiently to speed up the process for the clients.

Python multiprocessing within class passing class objects as arguments

0 Answers0