3

I'm trying to convert a cpu bound algorithm I have to a GPU one, and I'm having all sorts of trouble with cugraph. Some of it is my ignorance, the other part is just the infancy and underdevelopment of cugraph, and the final part is me just sucking at figuring out elegant vectorized approaches.

Let's say I have m data observations consisting of n features. I create a distance matrix by computing the euclidean distance for all observations from all observations (NOTE: This isn't the part I need help with nor is it meant to be optimal. Just adding this part for easily understood, reproducible code)

import numpy as np

def l1_distance(arr):
    return np.linalg.norm(arr, 1)

X = np.random.randint(low=0, high=255, size=(700,4096))

W = np.empty((700,700))

for i in range(700):
    for j in range(700):
        W[i,j] = l1_distance(X[i,:] - X[j,:])

First challenge

import networkx as nx

def build_weighted_graph(W):
    n = W.shape[0]
    Graph = nx.DiGraph()
        for i in range(n):
            for j in range(n):
                Graph.add_weighted_edges_from([(i,j,min(W[i,j], W[j,i]))])
    return Graph

where the input W matrix is a square distance matrix in Euclidean space (the nodes originally consisted of x-features). E.g. row 1 col 9 is the distance between node 1 and node 9, row 20 col 30 is the distance between node 20 and node 30, etc etc. The graph now draws edges between connected nodes and the weight of the edge is the euclidean distance measurement.

I spent about 8 hours trying figure out how to move this to GPU but even in NVIDIA's own documentation, they claim

The code block is perfectly fine for NetworkX. However, the process of iterating over the dataframe and adding one node at a time is problematic for GPUs and something that we try and avoid. cuGraph stores data in columns (i.e. arrays). Resizing an array requires allocating a new array one element larger, copying the data, and adding the new value. That is not very efficient.

If your code follows the above model of inserting one element at a time, the we suggest either rewriting that code or using it as is within NetworkX and just accelerating the algorithms with cuGraph.

So I gave up and left that portion how it was. The next part of the algorithm uses dijkstra's algorithm and calculates the shortest path for all nodes to all other nodes

res = dict(nx.all_pairs_dijkstra_path_length(Graph))

In cugraphs implementation, they only have single source dijkstra which takes in the graph and the source node as an argument. This is in contrast to networkx's library which comes with the above method and applies dijkstra's ubiquitously across all nodes. This means I would have to iteratively call SSSP (cugraph's dijkstra's implementation) for every single node (more for loops).

After I have the distances via connected nodes, I then create another square distance matrix, that is now based off of distances through connected nodes instead of originally taking euclidean distances.

D = np.zeros([n,n])
    for i in range(n):
        for j in range(n):
            D[i,j] = res[i][j]

I for the life of me can't figure out how to vectorize any of this for GPU. any help in the right direction would be hugely appreciated. Currently for the dataset my algorithm runs on, the CPU bound algorithm takes approx 5 mins to run for 698 nodes. Hence why I'm trying to speed things up GPU wise

Moose Sims
  • 141
  • 2
  • 11
  • Please add [reprex], including input data – Sergey Bushmanov Oct 28 '20 at 07:35
  • Before you go to GPU, have you tried moving from `python` based implementation like `networkx` to `C++` based `python` libraries, such as `graph-tool`? – Sparky05 Oct 28 '20 at 10:19
  • I haven't tried that becuase the overall results of my effort are intended to be scaled up to a DGX GPU Cluster and operate on massive dataset (too large to fit on a single machine). So that is why my efforts are getting it GPU ready. – Moose Sims Oct 28 '20 at 15:15
  • @SergeyBushmanov Added reproducible example – Moose Sims Oct 28 '20 at 15:23
  • What you are doing is something that cuGraph currently does not support efficiently, but addressing that shortfall is on our roadmap. The cuGraph Team are building a sample notebook to answer your question and will post it here. I'll add an answer with a link when its done. – TaureanDyerNV Oct 30 '20 at 21:03

1 Answers1

3

It looks like the answer to your first question -- initialization of the Euclidian distance matrix -- was answered in Using cupy to create a distance matrix from another matrix on GPU, but you can certainly optimize the creation of the graph and computation of Dijkstra's matrix utilizing cuDF and cuGraph.

To create the graph efficiently you can construct a cuDF dataframe listing the edges and their weights. This is straightforward due to structure of the Euclidian distance matrix. cuGraph takes in this dataframe as an edge list and returns the graph. Then you can loop over the nodes to compute shortest to applicable vertices. This could later be parallelized or distributed with Dask if the problem size increases.

The code below is about 40x faster than utilizing nx.all_pairs_dijkstra_path_length for this problem size, it also includes the initial distance computation.

import cupy as cp
import cudf as cd
import cugraph as cg

def build_weighted_graph_gpu(X, n):
    X_d = cp.array(X)
    G_d = cp.zeros([n, n])
    for i in range(n):
        G_d[i,:] = cp.abs(cp.broadcast_to(X_d[i,:], X_d.shape) - X_d).sum(axis=1)
    return G_d

def dijkstras_matrix_gpu(W_d):
    n = np.shape(W_d)[0]
    
    # Create a columnar dataframe describing edges
    df_g = cd.DataFrame({
        'source':      cp.array([x // n for x in range(n*n)]),
        'destination': cp.array([x % n for x in range(n*n)]),
        'weight':      cp.minimum(W_d, cp.transpose(W_d)).ravel()})
    
    graph_d = cg.Graph()
    
    graph_d.from_cudf_edgelist(df_g, source='source', destination='destination', edge_attr='weight', renumber=False)
    
    dist_d = cp.empty_like(W_d)

    for i in range(n):
        dist_d[i,:] = cp.asarray(cg.traversal.shortest_path(graph_d, i)['distance'])
    
    return dist_d

distance_matrix = dijkstras_matrix_gpu(build_weighted_graph_gpu(X))