0

So I'm creating my graph in a basic way:

import igraph
import numpy as np
graph = igraph.Graph()
graph.add_vertices(np.array([0,1,2,3,4,5]))
graph.add_edges(np.array([[0,1],[1,2],[3,4],[4,5],[3,5]]))

I wonder if it is possible to speed up converting edges of my graph to numpy array? I'm doing it like this right now:

print(np.array([n.tuple for n in graph.es])) # prints array [[0,1],[1,2],[3,4],[4,5],[3,5]]
mathfux
  • 5,759
  • 1
  • 14
  • 34
  • which version of python and numpy? – Anthony Kong Feb 10 '20 at 10:14
  • One of the latest, Python 3.7 and `numpy 1.17` – mathfux Feb 10 '20 at 10:18
  • Why do you want to convert your list to a `numpy` array? `igraph` doesn't work with `numpy`, so converting it before passing to `add_edges` should have no performance benefit. – Vincent Traag Feb 10 '20 at 10:43
  • I wonder, why is 3-5 there but not 1-4 (just as one example)? – kabanus Feb 10 '20 at 10:43
  • @kabanus weird question. I just wanted to make my graph not connected – mathfux Feb 10 '20 at 10:48
  • I think I misunderstood your goal. Seems you start out with an edge array - why don't you just save that in a variable? – kabanus Feb 10 '20 at 10:51
  • @kabanus Well, there are some situations where access of edges is not that straight. For example, if I try to generate a bunch of other graphs using `graph.decompose()`, I won't be able to them because it's not known what to store. – mathfux Feb 10 '20 at 10:56
  • @Vincent Traag I don't see a problem creating my edges and vertices from `numpy` arrays, it works for me. – mathfux Feb 10 '20 at 11:04
  • @Vincent Traag I need to use `numpy` arrays because it I need to perform some not `igraph` related actions efficiently, like finding components of 2D image in this topic: https://stackoverflow.com/a/59700254/3044825 – mathfux Feb 10 '20 at 11:09
  • @mathfux, sorry, I misunderstood your question. See below for an answer. – Vincent Traag Feb 10 '20 at 12:24
  • It appears that `np.array()` is actually one of the slowest in contrast with `np.fromiter()`. Posted an answer. – mathfux Sep 01 '20 at 23:39

2 Answers2

1

The easiest and also fastest way to convert all edges to a numpy array is as follows:

edges = np.array(graph.get_edgelist())

For a random graph with n=1000 nodes and m=5000 edges, this runs in

2.74 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

on my machine. The alternative of

edges = np.array([n.tuple for n in graph.es])

runs almost 30% slower, and takes

3.53 ms ± 542 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

on my machine.

Vincent Traag
  • 660
  • 4
  • 6
  • This is exactly what I've been looking for. Thank you! – mathfux Feb 10 '20 at 12:46
  • I don't know exactly what you need it for, but note that `igraph` also has functionality for decomposing the graph in connected components. There is also a recent update of `python-igraph`, version 0.8.0, which will be announced shortly, and is already available for download from [PyPi](https://pypi.org/project/python-igraph/), including Windows binary wheels. – Vincent Traag Feb 10 '20 at 12:52
  • I see. I'm very happy because I have just tested it for today and it worked for me, no wheels from unofficial binaries needed. This is the first time I've tested `decompose` too. – mathfux Feb 10 '20 at 12:55
0

After long time and with better knowledge I decided to compare several methods and a winner is:

np.fromiter(chain(*g.get_edgelist()), np.dtype('i'), count=g.ecount()).reshape(-1, 2)

It is more than 4x times faster than np.array(g.get_edgelist())! So importance of structuring output array can't be ignored here.

def edges_asarray1(g):
    return np.array(g.get_edgelist())

def edges_asarray2(g):
    return np.array([n.tuple for n in g.es])

def edges_fromiter1A(g):
    dt = np.dtype([('', np.intp)]*2)
    indices = np.fromiter(g.get_edgelist(), dt)
    indices = indices.view(np.intp).reshape(-1, 2)
    return indices

def edges_fromiter1B(g):
    index = np.fromiter(chain(*g.get_edgelist()), np.dtype('i'), count=2*g.ecount())
    return index.reshape(-1, 2)
fig = plt.figure(figsize=(10, 10))

def edges_fromiter2A(g):
    dt = np.dtype([('', np.intp)]*2)
    indices = np.fromiter(map(lambda x: x.tuple, g.es), dt)
    indices = indices.view(np.intp).reshape(-1, 2)
    return indices

def edges_fromiter2B(g):
    index = np.fromiter(chain(*map(lambda x: x.tuple, g.es)), np.dtype('i'), count=2*g.ecount())
    return index.reshape(-1, 2)

plt.grid(True, which="both")
out = perfplot.bench(
        setup = lambda x: ig.Graph.Erdos_Renyi(n=x, m=5*x),
        kernels = [edges_asarray1, edges_asarray2, edges_fromiter1A, edges_fromiter1B, edges_fromiter2A, edges_fromiter2B],
        n_range = [2 ** k for k in range(4, 21)],
        xlabel = 'n',
        title = 'testing graph with n nodes and 5*n edges',
        show_progress = True)
out.show()

enter image description here

mathfux
  • 5,759
  • 1
  • 14
  • 34