5

From C++ documentation on inplace_merge, the complexity of the algorithm is "Linear in comparisons (N-1) if an internal buffer was used, NlogN otherwise (where N is the number elements in the range [first,last))". What do they mean by an internal buffer and what causes a complexity of O(N-1) vs. O(NlogN)?

Aziz Shaikh
  • 16,245
  • 11
  • 62
  • 79
rolloff
  • 543
  • 1
  • 4
  • 5

2 Answers2

2

To expand on the other answer(s):

  • At least in libstdc++ and libc++, the "internal buffer" is provided by calling std::get_temporary_buffer, an obscure yet standard routine in the STL. This routine has been deprecated in C++17, basically because it's confusing and kind of dumb. See this question for details, or read Stephan T. Lavavej's original deprecation proposal.

  • In libstdc++, get_temporary_buffer is implemented as a call to operator new. (Source)

  • In libc++, the get_temporary_buffer is implemented as a call to operator new. (Source)

  • I don't know whether inplace_merge uses get_temporary_buffer in MSVC's standard library, but I would bet money that it does.

  • In MSVC, it has been reported that get_temporary_buffer is implemented as a call to operator new.

You can write a program to observe this call to operator new firsthand by overriding operator new in the global namespace:

#include <algorithm>
#include <cstdio>
#include <cstdlib>

void *operator new(size_t nbytes, const std::nothrow_t&) noexcept
{
    puts("hello from operator new!");
    return malloc(nbytes);
}

int main()
{
    int a[32] = {
        1,1,1,2,3,4,4,5,5,5,5,6,6,8,9,9,
        1,1,2,3,3,3,4,4,5,5,7,8,9,9,9,9,
    };
    std::inplace_merge(&a[0], &a[16], &a[32]);  // "hello from operator new!"
    (void)a;
}

TL;DR: The "internal buffer" is allocated on the heap by calling operator new. Implementations are not required to call operator new, but in practice they all do. Stay far, far away from inplace_merge if you value your heap.

Community
  • 1
  • 1
Quuxplusone
  • 23,928
  • 8
  • 94
  • 159
  • Any idea about how it degrades to O(N log N) when there's not enough space for O(N), i.e., what algorithm might be used for that? – superb rain Nov 25 '20 at 13:31
  • @superbrain: Well, unless I'm missing something, `std::inplace_merge(first, mid, last, lessthan)` can be implemented conformingly as `std::stable_sort(first, last, lessthan)`. I don't know if there's something "cleverer" that could be done there, but `stable_sort` is O(N log N), right? – Quuxplusone Nov 25 '20 at 20:59
  • No, under those circumstances (not enough space), it's apparently only guaranteed to be [O(N log(N)^2)](https://en.cppreference.com/w/cpp/algorithm/stable_sort#Complexity). – superb rain Nov 25 '20 at 21:30
  • Ah, found it. [`inplace_merge`](https://en.cppreference.com/w/cpp/algorithm/inplace_merge#Possible_implementation) has two links to implementations where we can see the algorithms (I think both are pretty much the same, but the libstdc++ one is easier to read). – superb rain Nov 25 '20 at 22:18
  • Different question, though: You say *"Stay far, far away from `inplace_merge` if you value your heap"*. What should I do instead? – superb rain Nov 25 '20 at 22:27
  • @superbrain: FWIW, a lot of the STL's point _is_ heap allocation. If you're `inplace_merge`'ing on a `std::vector`, you're already doing heap allocation, and a little more might not hurt — `std::inplace_merge(v.begin(), mid, v.end())` is a better idea than `std::merge(v.begin(), mid, mid, v.end(), std::back_inserter(v2)); v = std::move(v2);`. But ***if*** you're on embedded, consciously avoiding heap-allocation, using static arrays for everything, and thinking "STL algos are safe because they don't allocate," ***then*** `inplace_merge` is one of the three you're wrong about. – Quuxplusone Nov 25 '20 at 23:55
  • Ah, ok, so I guess as a typical PC user I just don't "value my heap" then :-) – superb rain Nov 26 '20 at 00:01
  • Hmm... but if you did do that embedded/static and wanted an inplace merge, how would you do it? With `merge`, I think you'd have to spend extra temporary space as large as the whole array, right? While `inplace_merge` would presumably allocate at most half the size (to move the smaller of the two merge-parts there). Or would you write your own merge code instead? – superb rain Nov 26 '20 at 00:14
  • Quuxplusone: inplace_merge cannot be implemented by stable_sort. inplace_merge is supposed to be N-1 compares with enough memory and stable_sort does more compares that that, inplace_merge is supposed to resemble merge and assumes it operates on 2 sorted sequences like merge. – SJHowe Jan 04 '22 at 10:45
  • superb rain: inplace_merge, at least for bidirectional iterators, if you are inplace merging 1,000,000 elements and 2 elements, then the maximum buffer size required is 2. It is always the minimum of the 2 sizes, because merging can be down forwards or back, preserving elements. So the buffer size required is min(M,N) * sizeof(T) where T is the element and M, N the size of the blocks – SJHowe Jan 04 '22 at 10:50
  • superb rain: The same algorithm is used but operating on a smaller number of elements. In fact, stable_sort(), stable_partition, inplace_merge() all do some pre processing, call themselves twice, with half the number of number elements and then do some post processing so it is as if you had done the whole number of elements – SJHowe Jan 04 '22 at 14:25
1

An internal buffer is simply a buffer allocated by the library of sufficient size to hold the output of the merge while the merge is happening (it's copied back to the original range after the merge is complete). If this extra space is used, the merge can be done in linear time. If it can't or doesn't use a separate buffer to store the output then the operation degrades to a general purpose sort with runtime O(n log n).

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • What general purpose sort might that be? I can't think of one that's stable, O(n log n) time, and o(n) space. – superb rain Nov 25 '20 at 13:26
  • It does not degrade to a general purpose sort. Instead if there is not enough memory, the bigger block if knifed in two. The other block is then either lower_bound or upper_bound searched using the middle element of the bigger block depending if the bigger block is 1st or 2nd. Having split the 2 blocks into 4 blocks, the 2nd and 3rd of the 4 blocks are rotated (possibly using whatever temporary). Now the problem is reduced to inplace_merging the 1st & 2nd block and 3rd & 4th block which has to be smaller and possibly can make use of any smaller buffer. – SJHowe Dec 28 '21 at 21:01