0

Consider the following two implementations of dijkstra's algorithm:

Using set:

// V - total number of vertices
// S - source node
void dijkstra(int V, vector<vector<int>> edge[], int S)
{
    vector<int> dist(V, 1e9);
    dist[S] = 0;
    
    set<pair<int, int>> s;
    for (int  i=0; i<V; i++)
        s.insert({dist[i], i});    // log(V) time
    
    while (!s.empty())    // exactly V times
    {
        auto top = *(s.begin());    // constant time
        int dis = top.first;
        int node = top.second;
        s.erase(top);               // amortised constant time

        for (auto it: edge[node])    // For all the iterations of outer while loop this will sum up to E, where E is the total number of edges in the graph
        {
            int nb = it[0];
            int edge_weight = it[1];
            
            if (dist[nb] > dis + edge_weight)
            {
                s.erase({dist[nb], nb});    // log(V) time
                dist[nb] = dis + edge_weight;
                s.insert({dist[nb], nb});    // log(V) time
            }
        }
    }
}

Using priority queue:

// V - total number of vertices
// S - source node
void dijkstra(int V, vector<vector<int>> edge[], int S)
{
    vector<int> dist(V, 1e9);
    dist[S] = 0;
    
    priority_queue<pair<int, int> , vector<pair<int, int>>, greater<pair<int, int>>> pq;
    pq.push({dist[S], S});
    
    while (!pq.empty())    // Can be more than V times, let's call this heap_size times
    {
        int node = pq.top().second;    // O(1) time
        pq.pop();                      // log(heap_size) time
        
        for (int i=0; i<edge[node].size(); i++)    // Can be approximated to (E/V), where E is the total number of edges in the graph
        {
            int nb = edge[node][i][0];
            int edge_weight = edge[node][i][1];

            if (dist[nb] > dist[node] + edge_weight) {
                dist[nb] = dist[node] + edge_weight;
                pq.push({dist[nb], nb});    // log(heap_size) time
            }
        }
    }
    return dist;
}

Finding the time complexity using set based approach is easy as the number of elements in set is exactly V(number of vertices) and the inner for loop runs for every edge, so it's time complexity is O(V*log(V) + V + E*log(V)) which is equivalent to O(E*log(V)) (reason: What's the time complexity of Dijkstra's Algorithm)

But I have trouble in understanding the time complexity of the priority_queue approach. Here the same node can be present multiple times in the priority_queue with different distances. How do I calculate the upper bound on the number of nodes that are added to the heap?

I also want to decide which implementation to use based on the nature of the graph(sparse vs dense), are both these implementations equivalent for any type of graph?

JavaLearner
  • 527
  • 1
  • 5
  • 16
  • 1
    Shouldn't it be `decreaseKey` instead of `push` in the inner cycle? Then, priority queue has the same node only once. However, it seems, `decreaseKey` is not implemented in STL priority_queue: https://stackoverflow.com/questions/14412793/implement-decreasekey-in-stl-priority-queue-c#14412830 – gltronred Dec 26 '22 at 09:05
  • You're asking multiple questions. Please focus on one question. – trincot Dec 26 '22 at 09:11
  • @trincot Aren't both these questions related to each other, as in the time complexity deciding which implementation to use based on graph structure. – JavaLearner Dec 26 '22 at 09:14
  • They are, but they are different and require a dedicated answer. – trincot Dec 26 '22 at 09:17
  • @gltronred The algorithm typically found in books like CLRS is the set one. Here all the elements are inserted in the set beforehand. It is implemented using set to simulate the behavior of `decreaseKey` in min_heap. The second approach(priority_queue one) is a bit different - it inserts elements in the heap as and when they are explored. – JavaLearner Dec 26 '22 at 09:24

1 Answers1

1

Your priority_queue version isn't quite right.

Your while loop should start like this:

    auto nextPair = pq.top();
    pq.pop();
    if (dist[nextPair.second] != nextPair.first) {
        continue;
    }
    int node = nextPair.second;

This ensures that each vertex is processed only once, when its current record is popped from the priority queue.

The complexity analysis then becomes easy, since each edge will then be processed at most once, and there are then at most |E| inserts into the priority queue.

Total complexity is then O(E log E), and since E < V2, that's the same as O(E log V).

The major disadvantage of the priority queue method is that it can consume O(E) space. This is usually OK, since it's on par with the space consumed by the graph itself. Since the priority_queue is a lot faster than set, the priority_queue version is the way that it is commonly done in practice.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • Even with this added check, it is still possible that there are multiple entries in the priority_queue with the same node and distance, which will be processed each. – JavaLearner Dec 26 '22 at 15:17
  • No, you only add an entry in the priority queue when you *decrease* a node's distance, so the distance is unique for every entry for a given node. – Matt Timmermans Dec 26 '22 at 16:39
  • My bad. Got it. Just one last question, I didn't get why "priority_queue is a lot faster than set". Both seem to have the similar time complexities for the operations involved in the above implementations. – JavaLearner Dec 27 '22 at 06:19
  • 1
    The time complexities are the same, but that still leaves a lot of room for one to be faster than the other. Heap operations are simpler than red-black tree operations, heaps require far fewer memory allocations, and heap memory accesses have better locality and therefore take better advantage of the cache. – Matt Timmermans Dec 27 '22 at 13:32