6

I have a directed graph in NetworkX. The edges are weighted from 0 to 1, representing probabilities that they occurred. The network connectivity is quite high, so I want to prune the edges such for every node, only the highest probability node remains.

I'm not sure how to iterate over every node and keep only the highest weighted in_edges in the graph. Is there a networkx function that allows us to do this?

Here is an example of what I'd like to be able to do.

Nodes:
A, B, C, D

Edges:
A->B, weight=1.0
A->C, weight=1.0
A->D, weight=0.5
B->C, weight=0.9
B->D, weight=0.8
C->D, weight=0.9

Final Result Wanted:
A->B, weight=1.0
A->C, weight=1.0
C->D, weight=0.9

If there are two edges into a node, and they are both of the highest weight, I'd like to keep them both.

Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
ericmjl
  • 13,541
  • 12
  • 51
  • 80

3 Answers3

8

Here are some ideas:

import networkx as nx

G = nx.DiGraph()
G.add_edge('A','B', weight=1.0)
G.add_edge('A','C', weight=1.0)
G.add_edge('A','D', weight=0.5)
G.add_edge('B','C', weight=0.9)
G.add_edge('B','D', weight=0.8)
G.add_edge('C','D', weight=0.9)

print "all edges"
print G.edges(data=True)

print "edges >= 0.9"
print [(u,v,d) for (u,v,d) in G.edges(data=True) if d['weight'] >= 0.9]

print "sorted by weight"
print sorted(G.edges(data=True), key=lambda (source,target,data): data['weight'])
Aric
  • 24,511
  • 5
  • 78
  • 77
  • Aric, your answer is quite close to what I'm looking for, but I'm having trouble with the threshold that you set. I'm not looking for something that's above a certain threshold (`weight >=0.9`). You'll see that B->C is not included, as there is an edge A->C that is weighted higher. Rather, I'm looking for the top-valued edge(s) going into a given node. Perhaps I'll have to edit the question to make this clearer? – ericmjl Aug 14 '13 at 14:27
  • On second look, your answer provided me with the inspiration for what I needed to do. I will post my solution in a moment, but in the meantime, I will up-vote your answer. – ericmjl Aug 14 '13 at 18:36
7

The solution I had was inspired by Aric. I used the following code:

for node in G.nodes():
    edges = G.in_edges(node, data=True)
    if len(edges) > 0: #some nodes have zero edges going into it
        min_weight = min([edge[2]['weight'] for edge in edges])
        for edge in edges:
            if edge[2]['weight'] > min_weight:
                G.remove_edge(edge[0], edge[1])
ericmjl
  • 13,541
  • 12
  • 51
  • 80
0

The solution of ericmjl provided does not work entirely in my program due to the following Runtime Error. Moreover, it keeps the edges with the lowest probability, not the highest, as asked in the question (because: remove all edges with weight > min, instead remove all with weight < max). Furthermore, it sufficient to do the inner loop for len(edges) > 1, because we want to remove all edges from nodes with more than one edge.

The complete solution:

for node in G.nodes():
    edges = G.edges(node, data=True)
    if len(edges) > 1:  # some nodes have zero edges going into it
        max_weight = max([edge[2]['weight'] for edge in edges])

        for edge in list(edges):
            if edge[2]['weight'] < max_weight:
                G.remove_edge(edge[0], edge[1])
MaGarb
  • 43
  • 6