Time complexity of building a heap relative to two arguments, N and M

Question

Using the below code as an example:

public void method bigO(int N, int M){
    PriorityQueue<Integer>> minHeap = new PriorityQueue<Integer>();

    for (int i = 0; i < M; i++) {
         minHeap.add(i);
    }

    for (int i = 0; i < N; i++) {
         minHeap.add(i);
    }
}

The first loop would have time complexity of O(M log(L)) where L is the size/length of the heap. Similarly, the second loop would have complexity O(N log(L)). Since both M and N are linear terms, how would you determine the overall complexity? Would the overall complexity be something like Max(M log(L), N log(L))?

Ma3x · Answer 1 · 2022-01-28T22:14:04.337

You could think of it like this: since your code is performing both loops sequentially in their entirety, you are doing M add calls and then another N add calls. That is a total of M + N add calls.

Each add call has a time complexity of O(log(L)). But since you are adding M things in the first loop and N things in the second loop your L is growing linearly as M + N does.

Putting that together you get (M + N) * log(M + N) or L * log(L), where L is the amount of total items you have to deal with.

So your code has O(n * log n) time complexity - linearithmic.

ggorlen · Answer 2 · 2022-07-30T23:50:16.433

minHeap.add is log(n) where n is the size of the heap. Therefore, building a heap from an array of n elements is O(n log(n)).

Now, you're doing this operation twice for two unrelated variables, n and m, so you'd normally add them together, O(n log(n) + m log(m)).

However, as pointed out in the comments, this is slightly inaccurate because the second operation adds to the existing heap of size n, giving O(n log(n) + m log(m + n)).

There's not enough information to be able to drop one of the variables since either one might dominate; this is captured in your Max() call, but I would avoid it because it's already implied by the O notation. I'd avoid the variable L as well--we're starting from an empty queue, so we can use n and m precisely.

Small correction, in the second loop you are inserting N elements to a stack with initially M items, so the complexity is actually N*log(M+N) while in the first loop the stack is initially empty and therefore the complexity is actually M*log(M) — Orr Benyamini, Jan 29 '22 at 01:14

templatetypedef · Answer 3 · 2023-06-27T20:01:15.940

The TL;DR:

Surprisingly, the runtime of this function is Θ(m + n), with no logarithms at all!
However, that’s largely an artifact of the specific sequence of items you’re inserting. If you didn’t know what those items were, the best bound is O((m + n) log (m + n)).

Now, the details:

A Θ(m + n) Bound

To see why this is, we need a few observations. First, let’s take a look at this loop:

PriorityQueue<Integer>> minHeap = new PriorityQueue<Integer>();

for (int i = 0; i < M; i++) {
     minHeap.add(i);
}

Initially, this looks like it takes time O(m log m): there are m items being added to an initially empty binary heap, so each heap insertion takes at most time O(log m).

However, this bound isn’t tight. In particular, the bound of O(log m) for inserting an item into an m-item binary heap assumes that the item may be swapped all the way to the root of the heap. But in this case, we are inserting the values 0, 1, 2, …, and m - 1, in that order. Therefore, after each item is inserted into the heap, it’s already bigger than its parent, and so no swap occurs. This means that each of these insertions actually takes time O(1), and so the actual cost of inserting those m elements is Θ(m).

We now come to this part of the code:

for (int i = 0; i < N; i++) {
     minHeap.add(i);
}

How much time does this take? Well, initially, we might say O(n log (m + n)) time: we are doing n insertions into a binary heap, and that heap has at most m + n items in it, so each insertion takes time at most O(log (m+n)).

But once again, we’re overestimating the amount of work done. Specifically, we are adding 0, 1, 2, …, and n - 1 to a binary heap that already has 0, 1, 2, …, and m - 1 in it. To understand the effect of doing this, let’s split the elements we’re inserting into two groups: the first m numbers (0, 1, 2, …, m - 1) and the remaining n - m items (m, m+1, …, n). It’s possible that n < m, in which case that second group doesn’t exist and the first group doesn’t actually have m items in it, but let’s ignore that case for now and assume that n ≥ m.

The first observation we can make is that inserting that second group of items (m, m+1, …, n) will take time Θ(n - m). The reason is basically the same as the reason above: all of those numbers are bigger than all the elements already in the heap. Why? Because at this point the heap will be values between 0 and m-1, inclusive. As a result, each of those insertions will take time O(1) to complete.

Now, “all” we have to do is analyze the cost of inserting the first group (0, 1, 2, …, m-1) into the heap. Surprisingly, this will always take time Θ(m). To see why, let’s make a series of observations.

Right before we insert the series 0, 1, 2, …, m-1 a second time into the heap, the heap already holds an existing copy of each of the values 0, 1, 2, …, m-1. Let’s color those copies blue, and imagine that the copies we are about to insert a second time are red. In other words, we start with a blue heap, then insert a bunch of red items into it.

Now, focus on what happens when we insert a red value into the heap. When we do, we place the red value as a leaf of the heap, then start swapping it with its parent until it’s at the root or bigger than its parent.

A hugely important observation is the following: the cost of inserting a red element is proportional to the number of blue nodes it swaps with when it’s inserted. To see why this is, notice that because we’re inserting the red elements in ascending order, once a red node gets a red parent, it must stop swapping upwards, because it will be bigger than its red parent. Therefore, any swaps that are performed must be between blue nodes and red nodes, so the cost of inserting a red node is the number of blue nodes it swaps with.

This is significant because it will let us bound the total work done inserting red nodes into the tree by counting the maximum possible number of times a blue node can be swapped downward with a red node. In particular, every time we swap a blue node with a red node, the blue node gets lower in the heap. So let’s think about how many times the blue nodes can move down in the heap.

Right before we insert any red nodes into the tree, we have a heap of m blue nodes. The height of this heap is (roughly) lg m, where lg m = log₂ m. When we’re done inserting all our red nodes, the heap will have 2m elements in it, so its height will be (roughly) lg (2m) = lg m + 1, one level higher than what we started with.

Now, look at those blue nodes. Of them, at most half the nodes will be at the bottom layer of the initial heap and thus can be swapped downward at most once. At most a quarter of them will be one level above that, and thus can be swapped downward at most twice. At most an eighth of them will be two levels from the bottom of the original heap and thus can be swapped downward at most three times. Summing this up, we see that the maximum number of swaps is given by

Σ_i=1^{lg m} i · (m / 2ⁱ) = m Σ_i=1^{lg m} i / 2ⁱ.

This sum is basically the same sum that comes up in the analysis of the heapify algorithm, and it works out to Θ(m), so the cost of inserting 0, 1, 2, …, m-1 a second time into the heap is Θ(m).

So, putting this all together, if n ≥ m, then the runtime of this function is the sum of the following:

Θ(m) work for inserting 0, 1, 2, …, m-1 into the heap for the first time.
Θ(m) work for inserting 0, 1, 2, …, m-1 into the heap for the second time.
Θ(n-m) work for inserting m, m+1, …, n into the heap.

This sums to Θ(m + n) work, with not a logarithm in sight!

But what if n < m? In that case, we can still get the same theta bound. Here’s how. First, notice that we still have to do Θ(m) work for the first loop to populate the heap with 0, 1, 2, …, m - 1. From there, we certainly do Ω(n) work to add n items into the heap. This gives us a lower bound of Ω(m + n) on the total work done

We can also upper-bound the work done in the second step at O(m + n). Adding 0, 1, 2, …, n-1 requires no more than the work to add 0, 1, 2, …, m-1 when n < m), and we can upper-bound O(m) with O(m + n). Combined with the work from the first phase, this gives us an upper bound of O(m + n).

Since we have a matching O and Ω bound of m + n in this case, the total work done when n < m similarly works out to Θ(m + n).

A More General O((m + n) log (m + n)) Bound

The analysis in the preceding section relies heavily on the fact that we are inserting sequences of values in increasing order into the binary heap. So let’s suppose that’s not the case, and that instead you insert a sequence of m arbitrary items into an empty binary heap, then a sequence of n arbitrary items into that heap. How much time work that take?

In that case, you’re inserting a total of m+n arbitrary items into a binary heap. We can bound the amount of work done here at O((m + n) log (m + n)) because each insertion adds to a heap with at most m + n items in it.

But can we tighten this bound at all? Not really. If we assume that every insertion has to walk all the way up to the top of the heap, then the total work done will be

lg 1 + lg 2 + … + lg (m+n).

The sum rule of logarithms then says that this is lg (m+n)!. And thanks to Stirling’s approximation, we know that

lg (m+n)! = Θ((m + n) log (m + n)),

so a more precise accounting of the sizes of the heaps as we do the insertions won’t asymptotically change the runtime.

Time complexity of building a heap relative to two arguments, N and M

3 Answers3

A Θ(m + n) Bound

A More General O((m + n) log (m + n)) Bound