0

I would like to understand in the following WORKING AND FINISHED code, why when updating pq_update, it is written as pq_update[neighbour][1].

Instead of writing pq_update[neighbour] (which is how I did it), it does not seem to change anything so why is it included ?

Thank you

import heapq

def dijkstra(graph, start):

  distances = {vertex:float('inf') for vertex in graph}
  pq = []
  pq_update = {}
  distances[start] = 0

  for vertex, value in distances.items():
    entry = [vertex, value]
    heapq.heappush(pq, entry)
    pq_update[vertex] = entry

  while pq:

    getmin = heapq.heappop(pq)[0]

    for neighbour, distance_neigh in graph[getmin].items():
      dist = distances[getmin] + distance_neigh
      if dist < distances[neighbour]:
        distances[neighbour] = dist
        pq_update[neighbour][1] = dist # THIS LINE !!!

  print(distances)
  return distances 


if __name__ == '__main__':

  example_graph = {
      'U': {'V': 2, 'W': 5, 'X': 1},
      'V': {'U': 2, 'X': 2, 'W': 3},
      'W': {'V': 3, 'U': 5, 'X': 3, 'Y': 1, 'Z': 5},
      'X': {'U': 1, 'V': 2, 'W': 3, 'Y': 1},
      'Y': {'X': 1, 'W': 1, 'Z': 1},
      'Z': {'W': 5, 'Y': 1},
  }
  dijkstra(example_graph, 'X')
  • That code seems buggy to me. The heap is sorted by the name of the node first, and only by distance second. But if you changed it to sort properly, when you modify an existing node's entry in the heap, you might break the heap property so there should really be a call to `heapify` or similar to fix it up after making any changes. – Blckknght Jul 27 '19 at 17:44

1 Answers1

2

Note: the implementation you have is broken and doesn't correctly implement Dijkstra. More on that below.

The pq_update dictionary contains lists, each with two entries:

for vertex, value in distances.items():
    entry = [vertex, value]
    heapq.heappush(pq, entry)
    pq_update[vertex] = entry

So pq_update[neighbour] is a list with both the vertex and the distance. You want to update the distance, not replace the [vertex, value] list, so pq_update[neighbour][1] is used.

Note that the entry list is also shared wit the heapq. The pq heap has a reference to the same list object, so changes to pq_update[neightbor][1] will also be visible in entries still to be processed on heap!

When you assign directly to pq_update[neighbour], you remove that connection.

The reason you don't see any difference is because the implementation of the algorithm is actually broken, as the heap is not used correctly. The heap is sorted by first by the first value in the list items you pushed in. In your code that's the node name, not the distance, and the heapq order of items is never updated when the distances in the list items are altered. Because the heapq is not used correctly, you always traverse the nodes in alphabetical order.

To use the heapq correctly, you need to put the edge length first, and you don't alter the values on the heap; if you use tuples you can't accidentally do this. You only need to push nodes onto the heap that you reached, really; you'll end up with multiple entries for some of the nodes (reached by multiple paths), but the heapq will still present the shortest path to that node first. Just keep a set of visited nodes so you know to skip any longer paths. The point is that you visit the shorter path to a given node before the longer path, and you don't need to alter the heapq items in-place to achieve that.

You could re-write your function (with better variable names) to:

def dijkstra(graph, start):
    """Visit all nodes and calculate the shortest paths to each from start"""
    queue = [(0, start)]
    distances = {start: 0}
    visited = set()

    while queue:
        _, node = heapq.heappop(queue)  # (distance, node), ignore distance
        if node in visited:
            continue
        visited.add(node)
        dist = distances[node]

        for neighbour, neighbour_dist in graph[node].items():
            if neighbour in visited:
                continue
            neighbour_dist += dist
            if neighbour_dist < distances.get(neighbour, float('inf')):
                heapq.heappush(queue, (neighbour_dist, neighbour))
                distances[neighbour] = neighbour_dist

    return distances
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Hi there, Thanks for this the code above is majorly broken, I have fixed it and ended up completely removing pq_update. – justanothertechdude Jul 27 '19 at 18:01
  • @justanothertechdude: yes, because you are not actually finding a shortest path to a destination node. You are conducting a breath-first full traverse over the whole graph. – Martijn Pieters Jul 27 '19 at 18:03
  • I could be missing something but what if we run this algorithm on the graph from [this answer](https://stackoverflow.com/a/28122240/2092025)? It seems if we start from node 0 and don't maintain priorities in the queue, we'll get incorrect distance to node 3. – kazarey Mar 03 '21 at 03:34
  • @kazarey: yup, my implementation here needs to use a priority queue, it is indeed wrong, and that post makes an excellent example as to why. – Martijn Pieters Mar 03 '21 at 09:27
  • @kazarey: thanks for pointing that out, I've updated the code to use a heapq correctly. – Martijn Pieters Mar 03 '21 at 09:51