make big process on graph with python parallelised

Question

i'm working on graphs and big dataset of complex network's. i run SIR algorithm on them with ndlib library. but each iteration takes something like 1Sec and it make code takes 10-12 h to complete . i was wondering is there any way to make it parallelised ? the code is like down bellow

this line of the code is core :

sir = model.infected_SIR_MODEL(it, infectionList, False)

is there any simple method to make it run on multi thread or parallelised ?

count = 500
for i in numpy.arange(1, count, 1):

    for it in model.get_nodes():

        sir = model.infected_SIR_MODEL(it, infectionList, False)

each iteration :

 for u in self.graph.nodes():

            u_status = self.status[u]
            eventp = np.random.random_sample()
            neighbors = self.graph.neighbors(u)
            if isinstance(self.graph, nx.DiGraph):
                neighbors = self.graph.predecessors(u)

            if u_status == 0:
                infected_neighbors = len([v for v in neighbors if self.status[v] == 1])
                if eventp < self.BetaList[u] * infected_neighbors:
                    actual_status[u] = 1
            elif u_status == 1:
                if eventp < self.params['model']['gamma']:
                    actual_status[u] = 2

That depends - are the iterations independent? Is only one of them independent - which one? Or does the `model` change during iterations? — Marek Schwarz, Oct 24 '19 at 15:05
there is a for and in it on each node process happen.. i think they are independent. one by one @MarekSchwarz — Nemo, Oct 24 '19 at 15:48
The question is If, during the processing, is the `model` changed somehow..? Or if you could run `i=10` independently of `i=100`. OK.. — Marek Schwarz, Oct 24 '19 at 15:50

Marek Schwarz · Accepted Answer · 2019-10-30T08:44:14.877

2

So, if the iterations are independent, then I don't see the point of iteration over count=500. Either way the multiprocessing library might be of interest to you.

I've prepared 2 stub solutions (i.e. alter to your exact needs). The first expects that every input is static (the changes in solutions as far as I understand the OP's question raise from the random state generation inside each iteration). With the second, you can update the input data between iterations of i. I've not tried the code as I don't have the model so it might not work directly.

import multiprocessing as mp


# if everything is independent (eg. "infectionList" is static and does not change during the iterations)

def worker(model, infectionList):
    sirs = []
    for it in model.get_nodes():
        sir = model.infected_SIR_MODEL(it, infectionList, False)
        sirs.append(sir)
    return sirs

count = 500
infectionList = []
model = "YOUR MODEL INSTANCE"

data = [(model, infectionList) for _ in range(1, count+1)]
with mp.Pool() as pool:
    results = pool.starmap(worker, data)

The second proposed solution if "infectionList" or something else gets updated in each iteration of "i":

def worker2(model, it, infectionList):
    sir = model.infected_SIR_MODEL(it, infectionList, False)
    return sir

with mp.Pool() as pool:
    for i in range(1, count+1):
        data = [(model, it, infectionList) for it in model.get_nodes()]
        results = pool.starmap(worker2, data)

        # process results, update something go to next iteration....

Edit: Updated the answer to separate proposals more clearly.

edited Oct 30 '19 at 08:44

answered Oct 24 '19 at 16:08

Marek Schwarz

578
6
10

thanks for responding. an Explanation : here model is a complex Graph and i run "infected_SIR_MODEL" on each node . then what is this line of code mean : "data = [(model, infectionList) for _ in range(1, count+1)]" ?? – Nemo Oct 30 '19 at 04:09
i set count on 10 . i got 3 line of result exactly like each other . 3 other again like each . 2 other lines are new result's but actually there is 3 bunch of res . why answers are like this? – Nemo Oct 30 '19 at 04:15
1

The `data = [(model, infectionList) for _ in range(1, count+1)]` does following: Make a list of tuples (which are input to `worker`) from 1 to `count` + 1. As far as I understand your `infected_SIR_MODEL` - you are generating some random state inside the function. That might cause the differences. The multiprocessing does nothing with your code. – Marek Schwarz Oct 30 '19 at 08:08
nice work thanks. infected_SIR_MODEL() method uses probability in it self. does multiprocessing make something that maybe some of results be exactly like together? – Nemo Oct 30 '19 at 11:05
1

You'll have to seed the rng generator inside the function. See this https://stackoverflow.com/questions/12915177/same-output-in-different-workers-in-multiprocessing and this https://stackoverflow.com/questions/9209078/using-python-multiprocessing-with-different-random-seed-for-each-process – Marek Schwarz Oct 30 '19 at 14:26

make big process on graph with python parallelised

1 Answers1