1

I have two dictionary objects, connections and network. This can be visualized as a graph, where each node is a computer and connections depict an edge between the computers and node. Network is a dictionary objects of unique networks where the computers can be a part, for ex

1,2  
2,3  
4,5  
5,1 

are four connection information for nodes 1 through 1
connections would thus be {1->1,2->1,3->1,4->2,5->1}
and network {1->0,2->1}
which means
computer 1,2,3,5 are part of n/w 1
computer 4 is part of n/w 2
again n/w 2 is interconnected to n/w 1
I have to read a file with thousand of such connection information
while doing so, for each connection info read I have to perform a non-sequential loop as follows

while network.has_key(connections[node1]):
    connections[node1]=network[connections[node1]] 

Is there a better way to optimize the above loop? If required I can share my entire code for the purpose

Gabe
  • 84,912
  • 12
  • 139
  • 238
Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • What do you mean by "Non-sequential loop"? Presumably, you're reading a single file sequentially, and I'm assuming that is your bottleneck. – Austin Marshall Oct 03 '11 at 16:43
  • Have you thought of profiling your code? CProfile is great http://docs.python.org/library/profile.html - after that you will have a clear idea of what is your bottleneck – Savino Sguera Oct 03 '11 at 16:49
  • @oxtopus, Please check the code snippet. Non-Sequential means neither I can use List Comprehension nor any list functions like reduce ... – Abhijit Oct 06 '11 at 05:19
  • @savinos, Yes I tried and this is how I identified the above code snippet – Abhijit Oct 06 '11 at 05:20

2 Answers2

2

I'm didn't really understand your question, but I feel the stock answer to graph-related questions applies here:

Use NetworkX rather than trying to come up with your own data representations and algorithms. It may be harder to learn, but it'll save you time and headaches in the long run.

Petr Viktorin
  • 65,510
  • 9
  • 81
  • 81
  • I actually used NetworkX to create my initial prototype of the program. Worked perfectly well. Am now trying to code the same without it. Obvious reason performance because of which I am here to optimize every bit I feel have bottlenecks. – Abhijit Oct 04 '11 at 02:28
1
while network.has_key(connections[node1]):
    connections[node1]=network[connections[node1]] 

That code does the lookup for network[connections[node1]] twice once for has_key and once for [].

It also repeatedly accesses connections[node1]

Instead you can do:

current = connections[node1]
while True:
    try:
         current = network[current]
    except KeyError:
         break
connections[node1] = current

But chances are better improvements can be had be reworking the rest of your code. But for that you may find http://codereview.stackexchange.com to be a better site.

Winston Ewert
  • 44,070
  • 10
  • 68
  • 83
  • Stupid of me. I was completely blind on this. I was somehow trying to look into this from a completely different perspective. – Abhijit Oct 04 '11 at 02:26
  • From cProfiler I found this line of code was the bottleneck of my whole system. Reducing 50% of time here could eventually reduce 20% of the processing time. – Abhijit Oct 04 '11 at 02:29
  • @user977038, perhaps but usually you solve that sort of thing by avoiding coming here, not making this faster. – Winston Ewert Oct 04 '11 at 03:04
  • @user977038, by avoiding coming here, I mean to the piece of code during execution. Don't make this code faster. Call it less. – Winston Ewert Oct 04 '11 at 04:35