0

I am trying to optimize a node data structure of a graph/network by avoiding duplicate edges. The graph is undirected and has no loops (in the sense that no node has edges to itself). Each node data structure contains a list of edges. To this end, I have the following function for adding an edge between two nodes of the graph:

"""
Creates an edge between `node1` and `node2`.
If i < j and there is an edge between node i and node j, then 
the edge between node i and node j will be stored in node i.
"""
def create_edge(self, node1, node2):
    if node1.id < node2.id:
        node1.edges.append(node2.id)
    else:
        node2.edges.append(node1.id)

And then, as part of the instantiation of the graph/network object, the following code is executed to create edges between nodes, based on some probability:

    for node1 in self.green_network:
        for node2 in range(node1 + 1, self.green_network):
            if random.random() < self.probability_of_an_edge:
                node1.create_edge(node2)

Now, I realise this second segment of code is incorrect, but I think it roughly illustrates what I am trying to do. Given my stated goal, it seems to me that, for every node in the first loop, for the second loop, I want to loop from (every node + 1) to the last node. So, for instance, we first loop between node.id = 1 to node.id = 2, node.id = 3, node.id = 4, ..., and then between node.id = 2 to node.id = 3, node.id = 4, node.id = 5, ..., and then between node.id = 3 to node.id = 4, node.id = 5, node.id = 6, ..., and so on. That way, we avoid duplicates, given how the create_edge function behaves.

This is related, but it doesn't seem to answer the question of iterating over objects, which is what I'm trying to do here.

The Pointer
  • 2,226
  • 7
  • 22
  • 50
  • I think by definition, any edge on an undirected graph is a loop, no? – AirSquid Oct 12 '22 at 16:25
  • @AirSquid I think you're referring to *cyclicality* (cyclic graphs vs acyclic graphs), whereas I'm referring to *loops*, in the sense that a node can have an edge to itself. – The Pointer Oct 12 '22 at 16:26
  • ahh yes, you are correct.... – AirSquid Oct 12 '22 at 16:27
  • Would you not want to store the edge in *both* nodes? Or is that not important in your application? How would Node 7 know that it had a connection to Node 5 if Node 5 had an edge to Node 7? – AirSquid Oct 12 '22 at 16:33
  • I understand you to mean that given a collection of nodes you want to consider each pair of nodes only once in a loop. If so, you want `for node1, node2 in itertoools.combinations(all_nodes, 2): ...`. – Steven Rumbalski Oct 12 '22 at 16:36
  • @AirSquid Hmm, my idea (which I read some where) is to establish a convention where, if i < j, then, in order to see if there is an edge between node i and node j, we check the edges of node i only. So, since 5 < 7, to check whether there is an edge from node 5 to node 7, we know to check the edges of node 5, and that we do not need to check the edges of node 7. My understanding is that, if we establish this convention for the program, then there shouldn't be any issues. Or am I overlooking something? – The Pointer Oct 12 '22 at 16:40
  • @ThePointer Well, it depends a bit. I think *in general* you want to be able to "arrive" at Node 7 and know what your options are. In an *undirected* graph, I would put the edges as part of each node in the pair. – AirSquid Oct 12 '22 at 16:44
  • @AirSquid Yes, I think that is definitely correct from a maintainability/readability perspective in software engineering. But, in this case, we're trying to optimize (by avoiding duplicates), which means maintainability/readability is sacrificed. I understand this part, and am ok with the trade-off. What I *am* concerned about here is correctness, and whether I've done something wrong – or whether this idea even makes sense in the first place! I'm kind of just tinkering here to see if I can optimize some aspects of my code. – The Pointer Oct 12 '22 at 16:51
  • @ThePointer OK. You should be able to slightly tweak my post below to only add the edge to the lower indexed node. Realize that the highest number node (N) will have 0 edges and if you are doing work on a graph, you will have to poll *all* the other nodes to see what Node N connects to... – AirSquid Oct 12 '22 at 16:55
  • @AirSquid Ahh, yes, you make a good point. – The Pointer Oct 12 '22 at 17:34

1 Answers1

1

As suggested in the comments, you could/should generate all the combos and consider them with equal probability.

from itertools import combinations
from random import random

class Node:
    count = 0

    def __init__(self):
        self.edges = []
        self.idx = Node.count
        Node.count += 1

    def add_edge(self, other: "Node"):
        # this adds an edge to BOTH nodes
        # if you *really* want, you could compare indices here and only add one edge... Buyer be ware! :)
        self.edges.append(other.idx)
        other.edges.append(self.idx)

# make 5 nodes
nodes = [Node() for _ in range(5)]

# this will produce a GENERATOR for all 10 possible combinations (5 choose 2 = 10)
all_pairs = combinations(nodes,2)

for n1, n2 in all_pairs:
    if random() > 0.25:  # 75% chance of addition...
        n1.add_edge(n2)

# check
for node in nodes:
    print(f'Node {node.idx} connects to: {node.edges}')

Output:

Node 0 connects to: [3]
Node 1 connects to: [2, 4]
Node 2 connects to: [1]
Node 3 connects to: [0, 4]
Node 4 connects to: [1, 3]
AirSquid
  • 10,214
  • 2
  • 7
  • 31
  • Hmm, yes, it seems that the kth node will be considered k - 1 times. – The Pointer Oct 12 '22 at 17:07
  • @ThePointer Only in the nested loop approach... using combination generator, "all nodes are considered equal" ;) – AirSquid Oct 12 '22 at 17:08
  • I'm just going through some examples. If we write out the combinations, since the edges are undirected, it seems that it should actually be ok. For an instance of 4 nodes, we would get (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4). And note that there are no duplicates here (for instance, (1, 2) would be the same as (2, 1)). So, again, since this is undirected, we have 3 chances of getting an edge to node 1 ((1, 2), (1, 3), (1, 4)), 3 chances of getting an edge to node 2 ((1, 2), (2, 3), (2, 4)), and, by the same reasoning, 3 chances of getting an edge to node 3 and 4. Does that seem correct? – The Pointer Oct 12 '22 at 17:27
  • Correction... Your nested loop is actually also correct in making pairs. It will produce all pairs only once. Spaced out on that. I'll edit my answer to remove that mistake. – AirSquid Oct 12 '22 at 19:44