0

I mean to create a directed graph, whose edges among nodes are created if a certain criterion is met, i.e.

if (crit(i,j)) :
    G.add_edge(i,j)

For this, I wrote a loop

nnd = len( ndlist )
for ind in range( nnd ) :
    ndi = ndlist[ ind ] 
    for jnd in range( nnd ) :
        if ( jnd != ind ) :  # no self interaction
            ndj = ndlist[ jnd ]
            interaction = inrange( ndi, ndj, threshold )
            if ( interaction ) :
                G.add_edge( ind, jnd )

ndlist is a list that contains features of the nodes, one element per node, and this info is used for assessing interaction among node pairs. The nodes are labeled with the same index number as in ndlist, and this is taken advantage of. This is also the reason for looping over the integers, since edges connect (ind,jnd), not (ndi,ndj).

This turns out to be very slow. Is there any way of optimizing this?

Of course, execution time will depend on inrange, but there might be a way of accelerating the code regardless of inrange (perhaps plugging the contents of ndlist as node attributes and iterating differently). Even transforming my "very slow" code into a "just slow" code would help.

2 Answers2

1

If I'm understanding the relationship between your variables, the following will at least make the code easier to read, and reduce the number of for loops, which as a general rule makes Python more efficient. Basically as I understand it (which may be a faulty understanding) you push more of the looping into C, rather than pure Python:

from intertools import permutations
for ind, jnd in permutations(ndlist, 2): #all pairs without replacemnt
    if inrange(ndi, ndj, threshold):
        G.add_edge(ndi, ndj)

You can make this more concise and avoid a for loop (which probably makes it more efficient) by using a list comprehension

from itertools import permutations
new_edges = [ (ind,jnd) for ind, jnd in permutations(ndlist, 2) if inrange(ndi, ndj, threshold)]
G.add_edges_from(new_edges)

Making new_edges a generator instead of a list would save memory.


As an aside: using more descriptive variable names would probably help make the code easier to read.

Community
  • 1
  • 1
Joel
  • 22,598
  • 6
  • 69
  • 93
  • Thanks. I will try both options and come back. – sancho.s ReinstateMonicaCellio Sep 09 '16 at 10:33
  • I tried the second option. Perhaps adding all edges at once (instead of one at a time) worked faster. But it didn't. The bottleneck is the evaluation of each candidate edge, and I guess there is not much to do about it. I would probably have to go for a compiled evaluation function. This is worth some work, and perhaps further SO questions. – sancho.s ReinstateMonicaCellio Sep 10 '16 at 12:35
0

The answer by Joel showed one way of making the code more readable, but still adding one edge at a time. And another way of adding all edges at a once, which perhaps made code faster. But it turns out that adding all edges at once does not make code faster (at least in my case).

PS: The first version of my code added one node at a time (this part is not shown above, to reduce clutter). Later I changed to adding all nodes at once with

G.add_nodes_from( range( nnd ) )
nx.set_node_attributes( G, 'pos', posdict )

and it made code much faster.

Thus, I had some expectation about something similar happening with edges. But the bottleneck is the evaluation of each candidate edge, and I guess there is not much to do about it. I would probably have to go for a compiled evaluation function. This is worth some work, and perhaps further SO questions.

Community
  • 1
  • 1