0

I am trying to parallelize operations on objects which are attributes of another object by using a simple top-level script to access methods contained within a module.

I have four classes in two modules: Host_Population and Host, contained in Host_Within_Population; and Vector_Population and Vector, contained in Vector_Within_Population. Host_Population.hosts is a list of Host objects, and Vector_Population.vectors is a list of Vector objects.

The top-level script looks something like this:

import Host_Within_Population
import Vector_Within_Population

host_pop = Host_Within_Population.Host_Population()
vect_pop = Vector_Within_Population.Vector_Population()


for time in range(5):
    host_pop.host_cycle(time)
    vect_pop.vector_cycle(time)

host_pop.calculate_variance()

This is a representation of the module, Host_Within_Population

class Host_Population(object):

    def host_cycle(self, time):
        for host in self.hosts:
            host.lifecycle(time)
            host.mort()


class Host(object):

    def lifecycle(self, time):
        #do stuff

    def mort(self):
        #do stuff

This is a representation of the module, Vector_Within_Population

class Vector_Population(object):

    def vector_cycle(self, time):
        for vect in self.vects:
            vect.lifecycle(time)
            vect.mort()


class Vector(object):

    def lifecycle(self, time):
        #do stuff

    def mort(self):
        #do stuff

I want parallelize the for loops in host_cycle() and vector_cycle() after calling the methods from the top-level script. The attributes of each Host object will be permanently changed by the methods acting on them in host_cycle(), and likewise for each Vector object in vector_cycle(). It doesn't matter what order the objects within each cycle are processed in (ie hosts are not affected by actions taken on other hosts), but host_cycle() must completely finish before vector_cycle() begins. Processes in vector_cycle need to be able to access each Host in the Host_Population, and the outcome of those processes will depend on the attributes of the Host. I will need to access methods in both modules at times other than host_cycle() and vector_cycle(). I have been trying to use multiprocessing.pool and map in many different permutations, but no luck even in highly simplified forms. One example of something I've tried:

class Host_Population:

    def host_cycle(self):
        with Pool() as q:
            q.map(h.lifecycle, [h for h in self.hosts])

But of course, h is not defined.

I have been unable to adapt the response to similar questions, such as this one. Any help is appreciated.

ddn
  • 419
  • 1
  • 5
  • 8

1 Answers1

0

So I got a tumbleweed badge for this incredibly unpopular question, but just in case anyone ever has the same issue, I found a solution.

Within the Host class, lifecycle() returns a Host:

def lifecycle(self, time):
    #do stuff
    return self

These are passed to the multiprocessing method in the Host_Within_Population class, which adds them to the population.

def host_pop_cycle(self, time):
    p = Pool()
    results = p.map_async(partial(Host.lifecycle, time = time), self.hosts)
    p.close()
    p.join()
    self.hosts = []        
    for a in results.get():
        self.hosts.append(a)
ddn
  • 419
  • 1
  • 5
  • 8