time and space complexity of libstdc++ std::inplace_merge (__merge_without_buffer)

Question

What is the time and space complexity of the algorithm implemented by __merge_without_buffer called by libstdc++ std::inplace_merge? More specifically, I'm interested in these:

Upper limit for the recursion depth. Is it <= 2 * ceil(log2(n)) ?
Upper limit for the number of comparisons. Is it <= 2 * ceil(log2(n)) * n ?
Upper limit for the number of swaps. Is it <= 2 * ceil(log2(n)) * n ?

Ideally the algorithm is a famous one described in a famous paper, and the paper contains these upper limits as proofs. However, the libstdc++ source code doesn't specify any source.

For the is it <= ... questions above, I've got these formula conjectures by intuition, running the C program instrumented with counting for many inputs, and checking the upper limit for those inputs. However, I wasn't able to prove any of my conjectures in the general case. For the recursion depth formula, changing the constant from 2 to 1.5 makes it false, I've found some counterexamples.

Please note that I'm not interested in O(...) classes, but I want an actual upper limit, with a specific constant (hopefully 2 or less). I also need proofs for the upper limit formulas. A citation is enough as the proofs.

Based on comments and documentation of the implementations I think that both the number of comparisons and the number of swaps is O(log2(n) * n). However, there is no proof, this is not an upper limit formula, and the formula for the recursion depth is missing.

It's quite easy to prove that in a regular (non-inplace) merge algorithm, there is no recursion, the number of comparisons is <= n - 1, there are no swaps, and the number of copies is n. However, I'm specifically interested the inplace merge algorithm.

Above n is the total number of elements in the input/output array (== n1 + n2 == len1 + len2 == __len1 + __len2).

When calculating the number of swaps used by the rotate function, use 1 less than the total number of elements passed to the rotate function as an upper bound.

Here are some implementations of this algorithm:

C++: __merge_without_buffer function in libstdc++, called by std::inplace_merge.
Java: merge, reimplementation of the C++ __merge_without_buffer function in Java.
C: ip_merge_, reimplementation of the C++ __merge_without_buffer function in C.

Please note that the implementations above are equivalent for recursion depth, number of comparisons and number of swaps (as defined above).

This blog post explains a different inplace mergesort. That algorithm swaps adjacent memory regions of the same size only.

The article Practical In-Place Merging (also explained in a C++ code comment here) explains a different inplace merge algorithm.

For completeness, I copy the C implementation here:

#define SWAP(type, a, b) \
    do { type t=(a);(a)=(b);(b)=t; } while (0)

static void reverse_(int* a, int* b)
{
    for ( --b; a < b; a++, b-- )
       SWAP(int, *a, *b);
}
static int* rotate_(int* a, int* b, int* c)
/* swap the sequence [a,b) with [b,c). */
{
    if (a != b && b != c)
     {
       reverse_(a, b);
       reverse_(b, c);
       reverse_(a, c);
     }
    return a + (c - b);
}

static int* lower_bound_(int* a, int* b, const int key)
/* find first element not less than @p key in sorted sequence or end of
 * sequence (@p b) if not found. */
{
    int i;
    for ( i = b-a; i != 0; i /= 2 )
     {
       int* mid = a + i/2;
       if (*mid < key)
          a = mid + 1, i--;
     }
    return a;
}
static int* upper_bound_(int* a, int* b, const int key)
/* find first element greater than @p key in sorted sequence or end of
 * sequence (@p b) if not found. */
{
    int i;
    for ( i = b-a; i != 0; i /= 2 )
     {
       int* mid = a + i/2;
       if (*mid <= key)
          a = mid + 1, i--;
     }
    return a;
}

static void ip_merge_(int* a, int* b, int* c)
/* inplace merge. */
{
    int n1 = b - a;
    int n2 = c - b;

    if (n1 == 0 || n2 == 0)
       return;
    if (n1 == 1 && n2 == 1)
     {
       if (*b < *a)
          SWAP(int, *a, *b);
     }
    else
     {
       int* p, * q;

       if (n1 <= n2)
          p = upper_bound_(a, b, *(q = b+n2/2));
       else
          q = lower_bound_(b, c, *(p = a+n1/2));
       b = rotate_(p, b, q);

       ip_merge_(a, p, b);
       ip_merge_(b, q, c);
     }
}

Why is this tagged Java? And what specific question do you have about the time and space complexity? This looks like a homework dump. — tgdavies, Jul 18 '23 at 00:14
@tgdavies: This is not a homework dump. Our algorithm analysis homeworks at university were much easier. My very specific questions are listed in the 3 bullet points near the beginning of the question (all 3 are formulas for the upper limit). I was trying to be very specific and clear about them. Do you have any advice how I could make them any clearer? — pts, Jul 18 '23 at 00:24
You're asking "Is it Is it <= 2 * ceil(log2(n))". Could you say: "I believe it is <= 2 * ceil(log2(n)), but I'm not certain because of [specific doubt you have about your proof]"? I think that would be an improvement. — tgdavies, Jul 18 '23 at 00:31
@tgdavies: For these formulas all I have is some counting output for specific test inputs. I don't have a proof for the general case. I've added some more details to the question. — pts, Jul 18 '23 at 00:42
@tgdavies: I've added an answer containing my analysis so far. The problem with it is that the upper limit for the number of comparisons is *O(n**2)*, and I need *O(n * log(n))*, just like in [C++ std::inplace_merge](https://en.cppreference.com/w/cpp/algorithm/inplace_merge). The algorithm is fast enough, but the analysis is not precise enough. — pts, Jul 21 '23 at 09:45

pts · Answer 1 · 2023-07-21T09:47:31.943

Upper limit for the recursion depth.

Here is a proof for <= 2 * ceil(log2(m)), where m is the size of the longer input list (i.e. m = max(b - a, c - b) = max(n1, n2) for the ip_merge_ C function in the question.

If m <= 1, then there are no recursive calls, thus the recursion depth is 0.

If we can prove that the size of the longer list of each depth 2 recursive call is <= ceil(m/2), then by induction we get that after 2 * ceil(log2(m)) recursive calls, the size of the longer list becomes <= 1, thus no more recursion, thus we are done.

So it remains to prove that the size of the longer list of each depth 2 recursive call is <= ceil(m/2). Let's look at the original call, the depth 1 recursive calls (0, 1 or 2) and the depth 2 recursive calls (0, 1, 2, 3 or 4):

original call: k (shorter list), m (longer list). m >= k.
each depth 1 recursive call: a (determined by the return value of upper_bound_ or lower_bound_ on the shorter list) and b = ceil(m/2) (because the longer list is split at half). a <= k (because the shorter list is split at a).
each depth 2 recursive call, if a <= ceil(m/2): c (determined by the return value of upper_bound_ or lower_bound_ on the shorter list) and d (the longer list split at half). a <= ceil(m/2) = b, thus a <= b, thus the longer list has length b. c <= a (because the shorter list is split at c). d = ceil(b/2) (because the longer list is split at half). We need to prove max(c, d) <= ceil(m/2). This follows from c <= a <= b = ceil(m/2) and d = ceil(b/2) = ceil(ceil(m/2)/2) <= ceil(m/2).
each depth 2 recursive call, if a > ceil(m/2): c (determined by the return value of upper_bound_ or lower_bound_ on the shorter list) and d (the longer list split at half). a > ceil(m/2) = b, thus a > b, thus the longer list has length a. c <= b (because the shorter list is split at c). d = ceil(a/2) (because the longer list is split at half). We need to prove max(c, d) <= ceil(m/2). This follows from c <= b = ceil(m/2) and d = ceil(a/2) <= ceil(k/2) <= ceil(m/2).

Upper limit for the number of comparisons.

This is a partial answer.

Let's define bsl(r) as the number of comparisons done by a binary search (upper_bound_ or lower_bound_) in a list of size r. bsl(0) = 0, and for any r >= 1, bsl(r) = floor(log2(r)) + 1. Please note that: bsl(1) = 1, bsl(2) = 2, bsl(3) = 2, bsl(4) = 3.

Let's look at the original call and the depth 1 recursive calls (0, 1 or 2) for :

original call: k (shorter list), m (longer list). m >= k. If k = 0, then there are 0 comparisons. Otherwise, if k = m = 1, there is 1 comparison. Otherwise the binary search is done on the shorter list, thus the number of comparisons (in any of the cases above) is <= bsl(k) <= bsl(m).
each depth 1 recursive call: a (determined by the return value of upper_bound_ or lower_bound_ on the shorter list) and b = ceil(m/2) (because the longer list is split at half). a <= k (because the shorter list is split at a). The binary search is done on the shorter list, thus the number of comparisons is <= bsl(min(a, b)) <= bsl(min(k, ceil(m/2))) <= bsl(ceil(m/2)) <= bsl(m). Also it's <= bsl(k). These upper limits are also true if the binary search is not done (because of a = 0 or b = 0 or a = b = 1).

Thus the total number of comparisons covering the original call (1) and the depth 1 recursive calls (0, 1 or 2) is <= 3 * bsl(m). For depth 2 recursive calls (0, 1, 2, 3, 4), we can use the previous result that the size of the longer list is <= ceil(m/2).

Let tnc(m) be the total number of comparisons for a call where the size of the longer list is m. We have tnc(m) <= 3 * bsl(m) + 4 * tnc(ceil(m/2)). Also, for m <= 1, we have tnc(m) <= m. Also, with more detailed analysis of m and k, we get tnc(2) <= 8. (The tnc(2) limit from the generic formula is 10.)

The closed formula for this recursion is tnc(2**q) <= 1/18 * (5 * 4 ** q - 6 * q - 14) for any q >= 1. From this we get tnc(m) < 5/18 * m ** 2 for any m >= 2. This upper bound is too much, we need something like O(m * log2(m)). The algorithm is fast enough; the bounds in our analysis are not precise enough.

Upper limit for the number of swaps.

No formula yet.

time and space complexity of libstdc++ std::inplace_merge (__merge_without_buffer)

1 Answers1