15

After looking at the impressive performance comparison, I decided that I would give a try to graph-tool. So for comparison, I wrote codes to generate a random tree using both packages.

The graph-tool code:

import numpy as np
import graph_tool.all as gt

# construct an initial graph with two nodes and one link
n = 5000
G = gt.Graph(directed = False)
G.add_edge(0, 1)

for t in range(2, n):
    # connect the new vertex to one of the old vertices randomly
    G.add_edge(np.random.choice(range(t)), t)

The Networkx code:

import networkx as nx
import numpy as np

n = 5000
# initial graph
G = nx.Graph()
G.add_edge(0, 1)

for t in range(2, n):
    G.add_edge(t, np.random.choice(range(t)))

The graph-tool takes around 14 seconds on my 4-core machine whereas networkx takes less than 2 seconds on the same machine! Am I missing something obvious?

Thanks in advance.

Peaceful
  • 4,920
  • 15
  • 54
  • 79
  • I would guess that adding edges one by one is slower than some other use cases (e.g. doing lots of searches of a pre-existing graph). – Blckknght Mar 24 '16 at 05:43
  • Actually I run more complicated code with this and still networkx is giving better performance to me. (This one: http://link.springer.com/article/10.1140/epjb/e2015-60501-y#page-1) – Peaceful Mar 24 '16 at 05:49
  • Could it be that something is slow about initializing the network? – Joel Mar 24 '16 at 07:31
  • No. I am printing those times `t` and in graph-tool, I can see that those are printed more and more slowly as `t` becomes larger. – Peaceful Mar 24 '16 at 08:17

1 Answers1

30

There is nothing surprising here. graph-tool achieves greater performance by off-loading main loops to C++. If all your main loops are in Python, it offers no advantage. The same is true for other libraries like numpy.

The proper way to achieve fast addition of edges is to let graph-tool perform the main loop. The network you are generating is a simple growth model, and can be achieved in graph-tool by calling:

G = price_network(n, gamma=0, directed=False)

which takes about 15 ms in my computer for n=5000.

Note also that your python code is unnecessarily slow, since you create new lists with all vertices at each iteration. A much faster version would be:

from numpy.random import randint
n = 5000
G = Graph(directed=False)
G.add_vertex(n)
G.add_edge(0, 1)
for i in range(2, n):
    G.add_edge(i, randint(i))

For even larger values of n, it will be even faster to add all edges at once instead of one by one, i.e.

from graph_tool.all import *
from numpy.random import randint
n = 5000
G = Graph(directed=False)
edges = [(0, 1)]
for i in range(2, n):
    edges.append((i, randint(i)))
G.add_edge_list(edges)
Tiago Peixoto
  • 5,149
  • 2
  • 28
  • 28
  • 1
    Yes. Now I understand the issue. But then how do I construct some generalized graphs? For example, for preferential attachment models, I obviously can't add all the edges at once. Also, I actually want to simulate the network in which number of new nodes is some fraction of the current network size and so I can't add all nodes simultaneously too. Is that right? Thanks – Peaceful Mar 24 '16 at 15:55
  • Can you please guide me about achieving the task of many nodes simultaneously and efficiently? Thanks – Peaceful Mar 25 '16 at 16:23
  • @Peaceful, did you get around this? – lucid_dreamer Sep 01 '17 at 22:36
  • @user1712447 Tiago's answer helps. – Peaceful Sep 06 '17 at 07:47