0

I am given a list of edges. It can be generated easily from Barabassi graph like so:

import igraph as ig
import numpy as np
size = 15
g = ig.Graph.Barabasi(n = size, m = 3)
edges = np.array(g.get_edgelist())
ig.plot(g, bbox = (200, 200), vertex_label=range(g.vcount()), vertex_size = 12, vertex_label_size = 8)

This is example of my output:

enter image description here

I found that a method of finding incident nodes in igraph is simple:

def get_incidences_ig(roads, n):
    g = ig.Graph(roads, n=n)
    return g.get_adjlist()
>>> get_incidences_ig(edges, size)
[[1, 2, 3, 4, 5, 10, 13], [0, 2, 3, 4, 6, 7, 8, 9, 10], [0, 1, 3, 5, 7, 9, 11, 14], [0, 1, 2, 4, 5, 6, 7, 12], [0, 1, 3], [0, 2, 3, 6, 8, 9, 11, 13], [1, 3, 5, 8, 10, 12], [1, 2, 3, 12], [1, 5, 6, 14], [1, 2, 5], [0, 1, 6, 11, 13, 14], [2, 5, 10], [3, 6, 7], [0, 5, 10], [2, 8, 10]]

Now I try to implement it in numpy:

def get_incidences_np(roads, n):
    roads = np.vstack([roads, roads[:,::-1]])
    # later part is a grouping problem of 2 array with 2 columns: split array into groups by 1st column
    roads_sorted = roads[np.argsort(roads[:,0])]
    marker_idx = np.flatnonzero(np.diff(roads_sorted[:,0])) + 1
    stream = roads_sorted[:, 1]
    #sources = roads_sorted[np.r_[0, marker_idx], 0]
    targets = np.split(stream, marker_idx)
    return targets

It works. However, when I try to run it on larger graphs (edges of Barabassi graph of size = 10000, average degree = 3), it appears that numpy approach is remarkably slower:

>>> %timeit get_incidences_np(edges, size)
>>> %timeit get_incidences_ig(edges, size)
64.3 ms ± 22.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
10.6 ms ± 233 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It seems that my numpy approach (roads = np.vstack([roads, roads[:,::-1]]) + splitting of groups) is quite naive. Is there a more efficient way to implement this algorithm?

mathfux
  • 5,759
  • 1
  • 14
  • 34
  • Just my two cents, but it looks like you are trying to come up with a home-brewed solution to a standard graph theory problem. If igraph does not perform as well as you want, maybe there is another package that does. [This](https://stackoverflow.com/questions/606516/python-graph-library) thread mentions `graph-tool` that claims to be fast (I have not used it myself). Unless you are trying to challenge yourself with a numpy problem, I would spend more time looking at other tools out there. – hilberts_drinking_problem Oct 22 '20 at 06:23
  • @hilberts_drinking_problem I've tested `graph-tool` on Windows. It's very complicated, I needed to download docker (~1GB) in order to use within some weird kind of interactive console. Additionally, I hope to find some time to test [`NetworKit`](https://networkit.github.io) one day. But I really want to know if there is an algorithm easy enough to implement in `numpy` that catches up with these famous graph libraries – mathfux Oct 22 '20 at 15:32
  • I would guess that an efficient `numpy` solution would be hard to come by, since you are potentially dealing with sparsity and arrays of varying size. In other words, the problem you describe may be hard to "vectorize". You could try using `numba`, `cython` or interface with some lower level language. I had a lot of fun working with [nim](https://nim-lang.org), which has good support for Python interaction via [nimporter](https://github.com/Pebaz/nimporter). As much as I love python and numpy, I think that your problem might just have better solutions with other tools. – hilberts_drinking_problem Oct 23 '20 at 08:47

0 Answers0