I'm trying to convert a cpu bound algorithm I have to a GPU one, and I'm having all sorts of trouble with cugraph. Some of it is my ignorance, the other part is just the infancy and underdevelopment of cugraph, and the final part is me just sucking at figuring out elegant vectorized approaches.
Let's say I have m data observations consisting of n features. I create a distance matrix by computing the euclidean distance for all observations from all observations (NOTE: This isn't the part I need help with nor is it meant to be optimal. Just adding this part for easily understood, reproducible code)
import numpy as np
def l1_distance(arr):
return np.linalg.norm(arr, 1)
X = np.random.randint(low=0, high=255, size=(700,4096))
W = np.empty((700,700))
for i in range(700):
for j in range(700):
W[i,j] = l1_distance(X[i,:] - X[j,:])
First challenge
import networkx as nx
def build_weighted_graph(W):
n = W.shape[0]
Graph = nx.DiGraph()
for i in range(n):
for j in range(n):
Graph.add_weighted_edges_from([(i,j,min(W[i,j], W[j,i]))])
return Graph
where the input W matrix is a square distance matrix in Euclidean space (the nodes originally consisted of x-features). E.g. row 1 col 9 is the distance between node 1 and node 9, row 20 col 30 is the distance between node 20 and node 30, etc etc. The graph now draws edges between connected nodes and the weight of the edge is the euclidean distance measurement.
I spent about 8 hours trying figure out how to move this to GPU but even in NVIDIA's own documentation, they claim
The code block is perfectly fine for NetworkX. However, the process of iterating over the dataframe and adding one node at a time is problematic for GPUs and something that we try and avoid. cuGraph stores data in columns (i.e. arrays). Resizing an array requires allocating a new array one element larger, copying the data, and adding the new value. That is not very efficient.
If your code follows the above model of inserting one element at a time, the we suggest either rewriting that code or using it as is within NetworkX and just accelerating the algorithms with cuGraph.
So I gave up and left that portion how it was. The next part of the algorithm uses dijkstra's algorithm and calculates the shortest path for all nodes to all other nodes
res = dict(nx.all_pairs_dijkstra_path_length(Graph))
In cugraphs implementation, they only have single source dijkstra which takes in the graph and the source node as an argument. This is in contrast to networkx's library which comes with the above method and applies dijkstra's ubiquitously across all nodes. This means I would have to iteratively call SSSP (cugraph's dijkstra's implementation) for every single node (more for loops).
After I have the distances via connected nodes, I then create another square distance matrix, that is now based off of distances through connected nodes instead of originally taking euclidean distances.
D = np.zeros([n,n])
for i in range(n):
for j in range(n):
D[i,j] = res[i][j]
I for the life of me can't figure out how to vectorize any of this for GPU. any help in the right direction would be hugely appreciated. Currently for the dataset my algorithm runs on, the CPU bound algorithm takes approx 5 mins to run for 698 nodes. Hence why I'm trying to speed things up GPU wise