Parallel computing in graphs

Question

I want to do some analysis on graphs to find all the possible simple paths between all pairs of nodes in graph. With help of Networkx library I can use DFS to find all possible paths between 2 nodes with this function:

nx.all_simple_paths(G,source,target)

The below code runs without any workload since my toy example contains only 6 nodes in the graph. However, in my real task, my graph contains 5,213 nodes and 11,377,786 edges and finding all possible simple paths in this graph is impossible with below solution:

import networkx as nx
graph = nx.DiGraph()
graph.add_weighted_edges_from(final_edges_list) 

list_of_nodes = list(graph.nodes())

paths = {}

for n1 in list_of_nodes:
    for n2 in list_of_nodes:
        if n1 != n2:
            all_simple_paths = list(nx.all_simple_paths(graph,n1,n2))
            paths[n1+ "-"+n2] = all_simple_paths

The "paths" dictionary holds the "n1-n2" (source node and target node respectively) as keys, and list of all simple paths as values.

The question is whether I can use of multi processing in this scenario in order to run this code on my original problem or not. My knowledge about the processors, threads, shared memory and CPU cores are very naive and I am not sure if I can really use the concurrency (running my nested loops in parallel) in my task. I use a windows server with 128 GB RAM and 32 core CPU.

PS: Thorough searching the net (mostly StackOverFlow), I've found solutions which recommended to use threading and others recommended multiprocessing. I am not sure if I understand the distinction between these two :|

vkSinha · Answer 1 · 2020-02-04T06:08:27.933

If you want to use threading then use threadpool executor to submit your function call to a thread. It will return a future object. Future.result() will return the value returned by the call. If the call hasn’t yet completed then this method will wait up to timeout seconds.If call is not completed till that time it will raise the TimeoutError.

with ThreadPoolExecutor() as executor:
    for n1 in list_of_nodes:
        for n2 in list_of_nodes:
            if n1 != n2:
                all_simple_paths_futures = executor.submit(nx.all_simple_paths, graph,n1,n2)
            paths[n1+ "-"+n2] = all_simple_paths_futures
try:            
    for key in paths.keys():
        # get back  results from thread
        future_obj = paths[key]
        paths[key]= list(future_obj.result())
except Exception as e:
    print(e)
    raise e

For the difference between multiprocessing and threads, check this link :Multiprocessing vs Threading Python

That raises error :RuntimeError: can't allocate lock. As I looked the link you provided, I am wondering if I need to do multiprocessing instead of multi-threading since the muli-threading doesn't release GIL. am I right? — mpy, Feb 04 '20 at 07:53
@mpy I am not sure what causes this error, it may be because of large no of executor submit. As per multiprocessing, yes since your task is more CPU bound than i/o bound it's better to use multiprocessing instead of multithreading. — vkSinha, Feb 04 '20 at 08:34

Parallel computing in graphs

1 Answers1