I want to do some analysis on graphs to find all the possible simple paths between all pairs of nodes in graph. With help of Networkx library I can use DFS to find all possible paths between 2 nodes with this function:
nx.all_simple_paths(G,source,target)
The below code runs without any workload since my toy example contains only 6 nodes in the graph. However, in my real task, my graph contains 5,213 nodes and 11,377,786 edges and finding all possible simple paths in this graph is impossible with below solution:
import networkx as nx
graph = nx.DiGraph()
graph.add_weighted_edges_from(final_edges_list)
list_of_nodes = list(graph.nodes())
paths = {}
for n1 in list_of_nodes:
for n2 in list_of_nodes:
if n1 != n2:
all_simple_paths = list(nx.all_simple_paths(graph,n1,n2))
paths[n1+ "-"+n2] = all_simple_paths
The "paths" dictionary holds the "n1-n2" (source node and target node respectively) as keys, and list of all simple paths as values.
The question is whether I can use of multi processing in this scenario in order to run this code on my original problem or not. My knowledge about the processors, threads, shared memory and CPU cores are very naive and I am not sure if I can really use the concurrency (running my nested loops in parallel) in my task. I use a windows server with 128 GB RAM and 32 core CPU.
PS: Thorough searching the net (mostly StackOverFlow), I've found solutions which recommended to use threading and others recommended multiprocessing. I am not sure if I understand the distinction between these two :|