1

I'm working on optimizing some code that involves frequently accessing values associated with the centrality of each node in a graph. Since nodes with the same neighbors have the same value in my calculations, I thought it might be beneficial to group nodes with identical neighbor sets together. This way, I can reduce the number of calculations by only performing them once per group instead of for every individual node. However, I'm not sure how to efficiently implement this grouping method. Any suggestions or advice on the best way to approach this problem would be greatly appreciated.

This is the implementation i did but it's pretty basic. As you can see the fact that I have 2 loop the code is about O(n^2) which is quite a lot when used on big graph. As @ravenspoint and @PaulBrodersen correctly pointed out, my loop wasnt working correctly so i changed the implementation.

def group_similar_nodes(G):
    node_groups = []
    lonely_node = []
    nbNodes = 0
    neighbors = {}
    for node in G.nodes():
        neighbors[node] = set(G.neighbors(node))
        lonely_node.append(node)
        nbNodes+=1
    for node in range(nbNodes):
        if node in lonely_node:
            group = [node]
            lonely_node.remove(node)
            for node2 in range(node+1,nbNodes):
                if(node !=node2) and (node2 in lonely_node):
                    diff = neighbors[node] ^ neighbors[node2]
                    diff.discard(node)
                    diff.discard(node2)
                    if(not diff):
                        group.append(node2)
                        lonely_node.remove(node2)
            node_groups.append(group)
    return node_groups

Here is a MRE if you want to test the code

import networkx as nx

def group_similar_nodes(G):
    node_groups = []
    lonely_node = []
    nbNodes = 0
    neighbors = {}
    for node in G.nodes():
        neighbors[node] = set(G.neighbors(node))
        lonely_node.append(node)
        nbNodes+=1
    for node in range(nbNodes):
        if node in lonely_node:
            group = [node]
            lonely_node.remove(node)
            for node2 in range(node+1,nbNodes):
                if(node !=node2) and (node2 in lonely_node):
                    diff = neighbors[node] ^ neighbors[node2]
                    diff.discard(node)
                    diff.discard(node2)
                    if(not diff):
                        group.append(node2)
                        lonely_node.remove(node2)
            node_groups.append(group)
    return node_groups

i = 4 #number of communitues
n = 5 # number of nodes per communitues
p_in = 1 # probability of an edge within a community
p_out = 0.1 # probability of an edge between communities

G = nx.planted_partition_graph(i, n, p_in, p_out, seed=42) 
print(group_similar_nodes(G))
#expected result
# [[0, 1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13, 14], [15], [16], [17], [18], [19]]
Osamodas
  • 29
  • 6
  • 1
    could you provide [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example)? – dankal444 May 10 '23 at 15:09
  • I think this question could benefit from more detail. a) What centrality measure are you using? There are multiple. b) How are you calculating this centrality measure? Unless the graph is a tree, it seems unlikely that nodes could have the same centrality by mere virtue of sharing a neighbour: e.g. if node A and B share a neighbour and B and C share another neighbour not connected to A, what is the centrality of B? The same as A or the same as C? – Paul Brodersen May 11 '23 at 09:39
  • c) If the graph is a tree, then you can get your groups in `n * log(n)` time as you just have to traverse the tree using breadth-first-search. The descendants of each node form a group. [BFS traversal](https://networkx.org/documentation/stable/reference/algorithms/traversal.html) and [descendants](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.dag.descendants.html) are both implemented in networkx. – Paul Brodersen May 11 '23 at 09:42
  • @Paul Brodersen Tanks you for your answer. The centrality i calculate are betweeness centrality and closeness centrality. Since the code in python is just a test to see if what i had in mind worked, i just used networkx function to calculate centrality measures. There is several topology I would like to the program on. Several basic topology like star graph, path graph, complete graph, bipartite graph ect as well as more complex graph like small world graph or random graph. – Osamodas May 11 '23 at 10:10
  • I should also add that the program i am working on is for a Network Construction game where each node is a player and is adding or removing edges to improve his own cost, which is the sum of centrality * a predetermined factor. So there is a lot of centrality measures done and i was trying to reduce the amount of calculation done without changing the implementation of the centrality function at first. – Osamodas May 11 '23 at 10:16

2 Answers2

1
   for node2 in lonely_node

This loop causes a check to be done on each pair of nodes twice ( once with X,Y and again with Y,X )

I don't know how to to fix that in python. In c++ it would be something like this

for( node_index = 0;
     node_index < lonely_node.size();
     node_index++ )
   for( node_index2 = node_index+1;
        node_index2 < lonely_node.size();
        node_index++ )

Since you care about performance you should consider moving from python to a compiled language such as C++. This will give you a performance boost, sometimes running the same algorithm 50 times faster.

ravenspoint
  • 19,093
  • 6
  • 57
  • 103
  • Thank you for your suggestion. However, I believe that the loop in my current implementation does not iterate over each pair twice, as nodes are removed from the lonely_node list once they have been processed. That being said, I appreciate your recommendation to consider a compiled language like C++. I will definitely look into it as a potential solution for improving performance. Do you happen to know of any good graph libraries for C++ that you would recommend? – Osamodas May 10 '23 at 13:33
  • 1
    Python lets you remove members of a collection whilst you are iterating over the collection?!?! It really is a weird language. Recommendation for a C++ graph theory library - well, mine is called Pathfinder https://github.com/JamesBremner/PathFinder – ravenspoint May 10 '23 at 13:39
  • 1
    @ravenspoint It does but the pointer to the next item is not updated. The implementation of Osamodas has hence a bug as only every second item is removed. – Paul Brodersen May 11 '23 at 09:16
  • ... and, in fact, iterated over. – Paul Brodersen May 11 '23 at 09:43
  • @Paul Brodersen . The code was indeed buggy. Thanks you for pointing it out. I changed the implementation to correct it. – Osamodas May 11 '23 at 10:01
1

After finally understanding the question correctly, the answer is pretty straightforward and can be accomplished in linear time once the nx.Graph object is created. The nx.Graph object at its core represents the graph as its adjacency list i.e. a mapping of nodes to their (unordered list of) neighbours. We want a mapping of neighbours to nodes. All we hence need to do is normalize the representation of neighbour sets and then invert the mapping:

import networkx as nx

def invert_dict(d):
    # https://stackoverflow.com/a/485368/2912349
    inverted = dict()
    for k, v in d.items():
        inverted[v] = inverted.get(v, []) + [k]
    return inverted

if __name__ == '__main__':

    G = nx.Graph([(0, 1), (1, 2), (2, 3), (3, 0)]) # square graph

    d = dict()
    for node in G:
        d[node] = tuple(sorted(G.neighbors(node)))

    inverted = invert_dict(d)
    print(list(inverted.values()))
    # [[0, 2], [1, 3]]
 
Paul Brodersen
  • 11,221
  • 21
  • 38