Shortest path with 10,0000 nodes, 5 billion edges.
Suppose matrix $A$ is the symmetric weighted ajacency matrix (Undirected Graph), and element $A_{ij}$ represents the weight of edge $e_{ij}$, suppose the matrix size is $100000\times 100000$, so here are $n(n-1)/2=4999950000$ edges, essentially 5 billion edges, I wish to obtain two matrixes $L$ and $W$:
Let $P_{ij}=\left{v_1, v_2, \cdots, v_n\right}$ represents one of the shortest path with $n$ nodes from $i$ to $j$ (or $j$ to $i$ due to symmetric), hence we can calculate the following two quanties [L_{ij} = n-1, W_{ij} = \sum_{i=1}^{n-1}A_{v_i,v_{i+1}}]
(1) The Matrix $L$ of weighted shortest path length, whiere $L_{ij}$ represents the length of shortest from node $i$ to node $j$.
(2) The matrix $W$ of shortest path cumulative edge weights, where $W_{ij}$ represents the cumulative edge weight along the path.
Here I got that for all pair shortest path, there are algorithms like Floyd Warshall, Johnson algorithms, but it would require hundreds of days to calculate all pairs path length, what is the possible suitable solution for this?
I tried Python igraph with python code
import tqdm
import numpy as np
import pandas as pd
import igraph as ig
def all_pairs_shortest_length(ajacency_matrix):
m, n = ajacency_matrix.shape # ajacency_matrix is weighted
L = np.zeros((m,n), dtype=int) # L is symmetric
G = ig.Graph.Weighted_Adjacency(ajacency_matrix, mode = 'undirected') # 340G memory to construct graph with 65000 nodes
for v in tqdm.trange(m):
# v as start node, only calculate target nodes from v+1 to m
paths_rest = ig.GraphBase.get_all_shortest_paths(G, v=v, to = range(v+1, m), weights=G.es["weight"], mode="out")
target_with_length = pd.DataFrame([[path[-1],len(path)-1] for path in paths_rest], columns=['target','length']).drop_duplicates(ignore_index=True)
# for weighted shortest path, choose the path with minimum number of edges
L[v,v+1:] = target_with_length.groupby('target', group_keys=True).apply(min).length
L = L + L.T
M = np.array(ig.GraphBase.distances(G, weights=G.es["weight"]))
return L, M
if __name__=='__main__':
test_N = 100000
b = np.random.random_integers(1,100, size=(test_N, test_N))
ajacency_matrix = (b + b.T)/2
np.fill_diagonal(ajacency_matrix, 0)
L, M = all_pairs_shortest_length(ajacency_matrix)