I am given a list of edges. It can be generated easily from Barabassi graph like so:
import igraph as ig
import numpy as np
size = 15
g = ig.Graph.Barabasi(n = size, m = 3)
edges = np.array(g.get_edgelist())
ig.plot(g, bbox = (200, 200), vertex_label=range(g.vcount()), vertex_size = 12, vertex_label_size = 8)
This is example of my output:
I found that a method of finding incident nodes in igraph
is simple:
def get_incidences_ig(roads, n):
g = ig.Graph(roads, n=n)
return g.get_adjlist()
>>> get_incidences_ig(edges, size)
[[1, 2, 3, 4, 5, 10, 13], [0, 2, 3, 4, 6, 7, 8, 9, 10], [0, 1, 3, 5, 7, 9, 11, 14], [0, 1, 2, 4, 5, 6, 7, 12], [0, 1, 3], [0, 2, 3, 6, 8, 9, 11, 13], [1, 3, 5, 8, 10, 12], [1, 2, 3, 12], [1, 5, 6, 14], [1, 2, 5], [0, 1, 6, 11, 13, 14], [2, 5, 10], [3, 6, 7], [0, 5, 10], [2, 8, 10]]
Now I try to implement it in numpy
:
def get_incidences_np(roads, n):
roads = np.vstack([roads, roads[:,::-1]])
# later part is a grouping problem of 2 array with 2 columns: split array into groups by 1st column
roads_sorted = roads[np.argsort(roads[:,0])]
marker_idx = np.flatnonzero(np.diff(roads_sorted[:,0])) + 1
stream = roads_sorted[:, 1]
#sources = roads_sorted[np.r_[0, marker_idx], 0]
targets = np.split(stream, marker_idx)
return targets
It works. However, when I try to run it on larger graphs (edges of Barabassi graph of size = 10000, average degree = 3), it appears that numpy
approach is remarkably slower:
>>> %timeit get_incidences_np(edges, size)
>>> %timeit get_incidences_ig(edges, size)
64.3 ms ± 22.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
10.6 ms ± 233 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It seems that my numpy
approach (roads = np.vstack([roads, roads[:,::-1]])
+ splitting of groups) is quite naive. Is there a more efficient way to implement this algorithm?