8

The Kruskal's algorithm is the following:

MST-KRUSKAL(G,w)
1. A={}
2. for each vertex v∈ G.V
3.      MAKE-SET(v)
4. sort the edges of G.E into nondecreasing order by weight w
5. for each edge (u,v) ∈ G.E, taken in nondecreasing order by weight w
6.      if FIND-SET(u)!=FIND-SET(v)   
7.         A=A U {(u,v)}  
8.         Union(u,v)
9. return A

According to my textbook:

Initializing the set A in line 1 takes O(1) time, and the time to sort the edges in line 4 is O(E lgE). The for loop of lines 5-8 performs O(E) FIND-SET and UNION operations on the disjoint-set forest. Along with the |V| MAKE-SET operations, these take a total of O((V+E)α(V)) time, where α is a very slowly growing function. Because we assume that G is connected, we have |E| <= |V|-1, and so the disjoint-set operations take O(E α(V)) time. Moreover, since α(V)=O(lgV)=O(lgE), the total running time of Kruskal's algorithm is O(E lgE). Observing that |E|<|V|^2, we have lg |E|=O(lgV), and so we can restate the running time of Kruskal's algorithm as O(E lgV).

Could you explain me why we deduce that the time to sort the edges in line 4 is O(E lgE)? Also how do we get that the total time complexity is O((V+E)α(V)) ?

In addition, suppose that all edge weights in a graph are integers from 1 to |V|. How fast can you make Kruskal's algorithm run? What if the edges weights are integers in the range from 1 to W for some constant W?

How does the time complexity depend on the weight of the edges?

EDIT:

In addition, suppose that all edge weights in a graph are integers from 1 to |V|. How fast can you make Kruskal's algorithm run?

I have thought the following:

In order the Kruskal's algorithm to run faster, we can sort the edges applying Counting Sort.

  • The line 1 requires O(1) time.
  • The lines 2-3 require O(v) time.
  • The line 4 requires O(|V|+|E|) time.
  • The lines 5-8 require O(|E|α(|V|)) time.
  • The line 9 requires O(1) time.

So if we use Counting Sort in order to solve the edges, the time complexity of Kruskal will be

enter image description here

Could you tell me if my idea is right?

Also:

What if the edges weights are integers in the range from 1 to W for some constant W?

We will again use Counting Sort. The algorithm will be the same. We find the time complexity as follows:

  • The line 1 requires O(1) time.
  • The lines 2-3 require O(|V|) time.
  • The line 4 requires O(W+|E|)=O(W)+O(|E|)=O(1)+O(|E|)=O(|E|) time.
  • The lines 5-8 require O(|E|α(|V|)) time.
  • The line 9 requires O(1) time.

So the time complexity will be:

enter image description here

Mooncrater
  • 4,146
  • 4
  • 33
  • 62
Mary Star
  • 375
  • 7
  • 27

1 Answers1

9

Could you explain me why we deduce that the time to sort the edges in line 4 is O(E*lgE)?

To sort a set of N items we use O(Nlg(N)) algorithm, which is quick sort, merge sort or heap sort. To sort E edges we therefore need O(Elg(E)) time. This however is not necessary in some cases, as we could use sorting algorithm with better complexity (read further).

Also how do we get that the total time complexity is O((V+E)α(V))?

I don't think total complexity is O((V+E)α(V)). That would be complexity of the 5-8 loop. O((V+E)α(V)) complexity comes from V MAKE-SET operations and E Union operations. To find out why we multiply that with α(V) you will need to read in depth analysis of disjoint set data structure in some algorithmic book.

How fast can you make Kruskal's algorithm run?

For first part, line 4, we have O(E*lg(E)) complexity and for second part, line 5-8, we have O((E+V)α(V)) complexity. This two summed up yield O(Elg(E)) complexity. If we use O(N*lg(N)) sort this can't be improved.

What if the edges weights are integers in the range from 1 to W for some constant W?

If that is the case, than we could use counting sort for first part. Giving line 4 complexity of O(E+W) = O(E). In that case algorithm would have O((E+V)*α(V)) total complexity. Note that however O(E + W) in reality includes a constant that could be rather large and might be impractical for large W.

How does the time complexity depend on the weight of the edges?

As said, if weight of the edges is small enough we can use counting sort and speed up the algorithm.

EDIT:

In addition, suppose that all edge weights in a graph are integers from 1 to |V|. How fast can you make Kruskal's algorithm run? I have thought the following:

In order the Kruskal's algorithm to run faster, we can sort the edges applying Counting Sort.

The line 1 requires O(1) time. The lines 2-3 require O(vα(|V|)) time. The line 4 requires O(|V|+|E|) time. The lines 5-8 require O(|E|α(|V|)) time. The line 9 requires O(1) time.

Your idea is correct, however you can make bounds smaller.

The lines 2-3 requires O(|V|) rather than O(|V|α(|V|)). We however simplified it to O(|V|α(|V|)) in previous calculations to make calculations easier.

With this you get the time of: O(1) + O(|V|) + O(|V| + |E|) + O(|E|α(|V|)) + O(1) = O(|V| + |E|) + O(|E|α(|V|))

You can simplify this to either O((|V| + |E|) * α(|V|) or to O(|V| + |E|*α(|V|).

So while you were correct, since O((|V| + |E|) * α(|V|) < O((|V| + |E|) * lg(|E|)

Calculations for the |W| are analogous.

Neithrik
  • 2,002
  • 2
  • 20
  • 33
  • 1
    Riko You said that "O((V+E)α(V)) would be the complexity of the 5-8 loop. O((V+E)α(V)) complexity comes from V MAKE-SET operations and E Union operations." But at this for-loop we don't have MAKE-SET operations. So do we also take into consideration the other for-loop? If so, why could we do so although the number of times that the two for-loops are executed isn't the same? Also you said that for the line 4 we have O(E*lgE) complexity and for line 5-8 we have O((E+V)α(V)) complexity. So don't we take into consideration the first for-loop? And how if we sum these two up, do we get O(E*lgE) ? – Mary Star Apr 10 '15 at 13:21
  • Also, what does N represent at the time complexity O(N*lg(N)) ? Furthermore, can we only use counting sort at the case when the edge weights are integers in the range from 1 to W but not when they are in the range from 1 to |V|? If so, why is it like that? Also, why would the complexity of line 4 be O(E+W) ? – Mary Star Apr 10 '15 at 13:24
  • 1
    Yes, my bad, you also need to count first loop. Lines 2-3 and 5-8 together would than yield O((V+E)α(V)) complexity. If we sum O(ElgE) and O((E+V)α(V)) we get O(ElgE) because O((E+V)α(V)) < O(ElgE). – Neithrik Apr 10 '15 at 13:56
  • 1
    N represents number of elements that we sort. It doesn't matter in what range they are, but that range should be reasonable small. To understand counting sort better read about it here: http://en.wikipedia.org/wiki/Counting_sort – Neithrik Apr 10 '15 at 13:58
  • 1
    Riko So is the time complexity of the lines 2-3 equal to O(Vα(V)) ? Also is it like that? O(ElgE)+O((E+V)α(V))=O(ElgE)+O((V+E)*lgE), since α(V)=O(lgE) and O(ElgE)+O((V+E)*lgE)=O(E*lgE+V*lgE+E*lgE)=O((V+E)*lgE). Then we know that |E|<=|V|-1, because we assume that G is connected. So isn't it as follows? (V+E)*lgE<=(2V-1)*lgE=O(V*lgE) or am I wrong? How do we deduce that O(ElgE)+O((E+V)α(V))=O(E*lgE) ? – Mary Star Apr 10 '15 at 14:05
  • 1
    Yes and yes, your calculations are correct to the point where you assume |E| <= |V|-1. If the graph is connected we know that number of edges is at least |V| - 1. Making that equation |E| >= |V|-1. If you change that you will get O(Elg(E)) – Neithrik Apr 10 '15 at 15:10
  • Oh, yes, Riko, you are right... Since |E|>=V-1 we have (V+E)*lgE<=(2E+1)*lgE=O(E*lgE). I found this: "Counting sort assumes that each of the n imput elements is an integer in the range 0 to k, for some integer k. When k=O(n), the sort runs in Θ(n) time." Doesn't this mean that the line 4 requires Θ(W) time? Or have I understood it wrong? – Mary Star Apr 10 '15 at 15:59
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/74966/discussion-between-evinda-and-riko). – Mary Star Apr 10 '15 at 19:39
  • 1
    Riko, I edited my post. Could you take a look at it? – Mary Star Apr 14 '15 at 20:51
  • Riko, could you explain me further why the lines 2-3 require O(V) rather than O(Vα(|V|))? – Mary Star Apr 14 '15 at 21:08
  • You have to create V sets. To create one set you spend O(1), therefore you need V * O(1) = O(V) to create V of them. – Neithrik Apr 14 '15 at 21:09
  • Riko Ok, I will.. Is it known that MAKE-SET(v) requires constant time or do we have to prove it? If it takes constant time, is it because of the fact that it is like we put an element in a list? – Mary Star Apr 14 '15 at 21:18
  • 1
    No need to prove it. It is similar to putting element to list. It just creates empty set so it surely has constant time, as it always takes the same time. – Neithrik Apr 14 '15 at 21:20
  • Correct, but there is no need to replace alpha(V) with lg(V), with that you are just killing precision of result. – Neithrik Apr 14 '15 at 21:35
  • Riko Nice.. Thank you very much!!! :-) Could you maybe also take a look at this question? http://stackoverflow.com/questions/29594568/running-time-of-prims-algorithm Do you have an idea how fast we can make Prim's algorithm run? What can we do? – Mary Star Apr 14 '15 at 21:43
  • 1
    No need to thank me :) just upvote and accept :) as far as the other question is concerned, amalsom gave good answer, that I upvoted, not much to add. – Neithrik Apr 14 '15 at 21:46
  • 1
    No, I found this: http://www.eng-hs.net/files/Algorithms-Solutions-CH-23.pdf and at page 17, they found that the time complexity is O(E α(V)) . How did they get rid of V at O(V+E+Eα(V)) ? @Riko – Mary Star Apr 14 '15 at 22:04
  • 1
    V < E, therefore O(V) < O(E). We also know that O(E) = O(Eα(V)). – Neithrik Apr 14 '15 at 22:07
  • 1
    V<=E-1 holds only if the graph G is connected, right? So do we suppose it? – Mary Star Apr 14 '15 at 22:16
  • Yes, we suppose it is connected, since if it isn't you can't find spanning tree anyways. – Neithrik Apr 15 '15 at 13:44
  • Riko, could you take a look at this: http://stackoverflow.com/questions/29871890/time-complexity-of-modified-dfs-algorithm ? – Mary Star Apr 27 '15 at 00:00