Is the libc++ implementation of the STL deque push_front function standards-compliant?

Question

The C++ Standard (N4901) says this with reference to the deque push_front function:

An implementation shall provide these operations for all container types shown in the “container” column, and shall implement them so as to take amortized constant time.

If I am not mistaken, this means that a standards-compliant implementation should provide a deque push_front function that takes amortized O(1) time.

However, after looking at the source code and after running some test programs under gdb, I have come to believe that the libc++ implementation of the deque push_front function in fact has O(log n) amortized time.

For context: The STL deque is implemented in libc++ using a two-level array, the map is a dynamic array that contains pointers to fixed-sized arrays called blocks. There is only one map, which is unbounded in size, and an unbounded number of blocks, which are fixed in size.

Here is an accurate visualization (for proof, see below) of how the map looks after each new block (the blocks are named in order of allocation, so b01 was the first block that was allocated, b02 is the second block that was allocated, and so on) is added to the map by calls to push_front. Linear time operations such as copying or shifting the entire map array via memmove are labeled:

[]
[b01]
[b02, b01] copied
[b03, b02, b01, ___] copied
[b04, b03, b02, b01] shifted by 1
[b05, b04, b03, b02, b01, ___, ___, ___] copied
[___, b06, b05, b04, b03, b02, b01, ___] shifted by 2
[b07, b06, b05, b04, b03, b02, b01, ___] 
[b08, b07, b06, b05, b04, b03, b02, b01] shifted by 1
[b09, b08, b07, b06, b05, b04, b03, b02, b01, ___, ___, ___, ___, ___, ___, ___] copied
[___, ___, ___, b10, b09, b08, b07, b06, b05, b04, b03, b02, b01, ___, ___, ___] shifted by 4
[___, ___, b11, b10, b09, b08, b07, b06, b05, b04, b03, b02, b01, ___, ___, ___] 
[___, b12, b11, b10, b09, b08, b07, b06, b05, b04, b03, b02, b01, ___, ___, ___] 
[b13, b12, b11, b10, b09, b08, b07, b06, b05, b04, b03, b02, b01, ___, ___, ___] 
[___, b14, b13, b12, b11, b10, b09, b08, b07, b06, b05, b04, b03, b02, b01, ___] shifted by 2
[b15, b14, b13, b12, b11, b10, b09, b08, b07, b06, b05, b04, b03, b02, b01, ___] 
[b16, b15, b14, b13, b12, b11, b10, b09, b08, b07, b06, b05, b04, b03, b02, b01] shifted by 1

Proof: I used this program to look at when blocks are added to the map in libc++ and to examine the contents of the map in memory:

#include <deque>

long mylong = 13370000;

void my_pf(std::deque<long>& d){
    d.push_front(mylong);
    ++mylong;
}

void pushn(std::deque<long>& d, int n){
    for (int i = 0; i < n; i++){
        my_pf(d);
    }
}

int main()
{
    std::deque<long> d = {};
    // zero blocks
    pushn(d, 256); 
    // one block
    my_pf(d);
    // two blocks
    pushn(d, 511);
    // still two blocks
    my_pf(d);
    // three blocks
    pushn(d, 511);
    // still three blocks
    my_pf(d); // 4th block is added
    pushn(d, 511);
    my_pf(d); // 5th block is added
    pushn(d, 511);
    my_pf(d); // 6th block is added
    pushn(d, 511);
    my_pf(d); // 7th block is added
    pushn(d, 511); 
    my_pf(d); // 8th block is added
    pushn(d, 511);
    my_pf(d); // 9th block is added
    pushn(d, 511);
    my_pf(d); // 10th block is added
    pushn(d, 511);
    my_pf(d); // 11th block is added
    pushn(d, 511);
    my_pf(d); // 12th block is added
    pushn(d, 511);
    my_pf(d); // 13th block is added
    pushn(d, 511);
    my_pf(d); // 14th block is added
    pushn(d, 511);
    my_pf(d); // 15th block is added
    pushn(d, 511);
    my_pf(d); // 16th block is added
    pushn(d, 511);
    my_pf(d); // 17th block is added
}

Here I dump the contents of the map in gdb after every new block is added, starting at the first block:

(gdb) print __map_.capacity()
$3 = 1
(gdb) print *__map_.__first_@1
$4 = {0x4092c0}
(gdb) print __map_.capacity()
$1 = 2
(gdb) print *__map_.__first_@2
$2 = {0x40a2f0, 0x4092c0}
(gdb) print __map_.capacity()
$3 = 4
(gdb) print *__map_.__first_@4
$4 = {0x40b330, 0x40a2f0, 0x4092c0, 0x0}
(gdb) print __map_.capacity()
$5 = 4
(gdb) print __map_
$1 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x40b300, __begin_ = 0x40b300,
  __end_ = 0x40b320, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x40b320}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@4
$6 = {0x40c340, 0x40b330, 0x40a2f0, 0x4092c0}

(gdb) print __map_
$2 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x40d350, __begin_ = 0x40d350,
  __end_ = 0x40d378, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x40d390}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@8
$8 = {0x40d3a0, 0x40c340, 0x40b330, 0x40a2f0, 0x4092c0, 0x0, 0x0, 0x0}

(gdb) print __map_
$11 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x40d350, __begin_ = 0x40d358,
  __end_ = 0x40d388, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x40d390}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@8
$10 = {0x40d3a0, 0x40e3b0, 0x40d3a0, 0x40c340, 0x40b330, 0x40a2f0, 0x4092c0, 0x0}

(gdb) print __map_
$3 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x40d350, __begin_ = 0x40d350,
  __end_ = 0x40d388, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x40d390}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@8
$4 = {0x40f3c0, 0x40e3b0, 0x40d3a0, 0x40c340, 0x40b330, 0x40a2f0, 0x4092c0, 0x0}

(gdb) print __map_
$5 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x40d350, __begin_ = 0x40d350,
  __end_ = 0x40d390, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x40d390}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@8
$6 = {0x4103d0, 0x40f3c0, 0x40e3b0, 0x40d3a0, 0x40c340, 0x40b330, 0x40a2f0, 0x4092c0}

(gdb) print __map_
$7 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x4113e0, __begin_ = 0x4113e0,
  __end_ = 0x411428, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x411460}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print __map_.capacity()
$8 = 16
(gdb) print *__map_.__first_@16
$9 = {0x411470, 0x4103d0, 0x40f3c0, 0x40e3b0, 0x40d3a0, 0x40c340, 0x40b330, 0x40a2f0, 0x4092c0, 0x0, 0x0, 0x0, 0x0, 0x0,
  0x0, 0x0}

(gdb) print __map_
$10 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x4113e0, __begin_ = 0x4113f8,
  __end_ = 0x411448, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x411460}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@16
$11 = {0x411470, 0x4103d0, 0x40f3c0, 0x412480, 0x411470, 0x4103d0, 0x40f3c0, 0x40e3b0, 0x40d3a0, 0x40c340, 0x40b330,
  0x40a2f0, 0x4092c0, 0x0, 0x0, 0x0}

(gdb) print __map_
$12 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x4113e0, __begin_ = 0x4113f0,
  __end_ = 0x411448, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x411460}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@16
$13 = {0x411470, 0x4103d0, 0x413490, 0x412480, 0x411470, 0x4103d0, 0x40f3c0, 0x40e3b0, 0x40d3a0, 0x40c340, 0x40b330,
  0x40a2f0, 0x4092c0, 0x0, 0x0, 0x0}

(gdb) print __map_
$14 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x4113e0, __begin_ = 0x4113e8,
  __end_ = 0x411448, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x411460}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@16
$15 = {0x411470, 0x4144a0, 0x413490, 0x412480, 0x411470, 0x4103d0, 0x40f3c0, 0x40e3b0, 0x40d3a0, 0x40c340, 0x40b330,
  0x40a2f0, 0x4092c0, 0x0, 0x0, 0x0}

(gdb) print __map_
$16 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x4113e0, __begin_ = 0x4113e0,
  __end_ = 0x411448, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x411460}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@16
$17 = {0x4154b0, 0x4144a0, 0x413490, 0x412480, 0x411470, 0x4103d0, 0x40f3c0, 0x40e3b0, 0x40d3a0, 0x40c340, 0x40b330,
  0x40a2f0, 0x4092c0, 0x0, 0x0, 0x0}

(gdb) print __map_
$18 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x4113e0, __begin_ = 0x4113e8,
  __end_ = 0x411458, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x411460}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@16
$19 = {0x4154b0, 0x4164c0, 0x4154b0, 0x4144a0, 0x413490, 0x412480, 0x411470, 0x4103d0, 0x40f3c0, 0x40e3b0, 0x40d3a0,
  0x40c340, 0x40b330, 0x40a2f0, 0x4092c0, 0x0}

(gdb) print __map_
$20 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x4113e0, __begin_ = 0x4113e0,
  __end_ = 0x411458, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x411460}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@16
$21 = {0x4174d0, 0x4164c0, 0x4154b0, 0x4144a0, 0x413490, 0x412480, 0x411470, 0x4103d0, 0x40f3c0, 0x40e3b0, 0x40d3a0,
  0x40c340, 0x40b330, 0x40a2f0, 0x4092c0, 0x0}

(gdb) print __map_
$22 = {<std::__1::__split_buffer_common<true>> = {<No data fields>}, __first_ = 0x4113e0, __begin_ = 0x4113e0,
  __end_ = 0x411460, __end_cap_ = {<std::__1::__compressed_pair_elem<long**, 0, false>> = {
      __value_ = 0x411460}, <std::__1::__compressed_pair_elem<std::__1::allocator<long*>, 1, true>> = {<std::__1::allocator<
long*>> = {<No data fields>}, <No data fields>}, <No data fields>}}
(gdb) print *__map_.__first_@16
$23 = {0x4184e0, 0x4174d0, 0x4164c0, 0x4154b0, 0x4144a0, 0x413490, 0x412480, 0x411470, 0x4103d0, 0x40f3c0, 0x40e3b0,
  0x40d3a0, 0x40c340, 0x40b330, 0x40a2f0, 0x4092c0}

For comparison, here is the same visualization (for proof, see Appendix C) for the libstdc++ deque implementation. Linear time operations such as copying or shifting the entire map array are labeled:

[___, ___, ___, b01, ___, ___, ___, ___]
[___, ___, b02, b01, ___, ___, ___, ___]
[___, b03, b02, b01, ___, ___, ___, ___]
[b04, b03, b02, b01, ___, ___, ___, ___]
[___, ___, ___, ___, ___, ___, b05, b04, b03, b02, b01, ___, ___, ___, ___, ___, ___, ___] copied
[___, ___, ___, ___, ___, b06, b05, b04, b03, b02, b01, ___, ___, ___, ___, ___, ___, ___]
[___, ___, ___, ___, b07, b06, b05, b04, b03, b02, b01, ___, ___, ___, ___, ___, ___, ___]
[___, ___, ___, b08, b07, b06, b05, b04, b03, b02, b01, ___, ___, ___, ___, ___, ___, ___]
[___, ___, b09, b08, b07, b06, b05, b04, b03, b02, b01, ___, ___, ___, ___, ___, ___, ___]
[___, b10, b09, b08, b07, b06, b05, b04, b03, b02, b01, ___, ___, ___, ___, ___, ___, ___]
[b11, b10, b09, b08, b07, b06, b05, b04, b03, b02, b01, ___, ___, ___, ___, ___, ___, ___]
[___, ___, ___, ___, ___, ___, ___, ___, ___, ___, ___, ___, ___, b12, b11, ..., b01, ___x13] copied
[___, ___, ___, ___, ___, ___, ___, ___, ___, ___, ___, ___, b13, b12, b11, ..., b01, ___x13]
[___, ___, ___, ___, ___, ___, ___, ___, ___, ___, ___, b14, b13, b12, b11, ..., b01, ___x13]
[___, ___, ___, ___, ___, ___, ___, ___, ___, ___, b15, b14, b13, b12, b11, ..., b01, ___x13]
[___, ___, ___, ___, ___, ___, ___, ___, ___, b16, b15, b14, b13, b12, b11, ..., b01, ___x13]
[___, ___, ___, ___, ___, ___, ___, ___, b17, b16, b15, b14, b13, b12, b11, ..., b01, ___x13]
[___, ___, ___, ___, ___, ___, ___, b18, b17, b16, b15, b14, b13, b12, b11, ..., b01, ___x13]
[___, ___, ___, ___, ___, ___, b19, b18, b17, b16, b15, b14, b13, b12, b11, ..., b01, ___x13]
[___, ___, ___, ___, ___, b20, b19, b18, b17, b16, b15, b14, b13, b12, b11, ..., b01, ___x13]
[___, ___, ___, ___, b21, b20, b19, b18, b17, b16, b15, b14, b13, b12, b11, ..., b01, ___x13]
[___, ___, ___, b22, b21, b20, b19, b18, b17, b16, b15, b14, b13, b12, b11, ..., b01, ___x13]
[___, ___, b23, b22, b21, b20, b19, b18, b17, b16, b15, b14, b13, b12, b11, ..., b01, ___x13]
[___, b24, b23, b22, b21, b20, b19, b18, b17, b16, b15, b14, b13, b12, b11, ..., b01, ___x13]
[b25, b24, b23, b22, b21, b20, b19, b18, b17, b16, b15, b14, b13, b12, b11, ..., b01, ___x13]

Time complexity analysis

For the libc++ deque:

Each copy and each shift is a linear time operation, and since the number of shifts clearly dominates the number of copies, we will only look at the cost of the shifts.

To derive a lower bound we will only look at the shifts that happen after the last doubling of the map array.

Shifts are done using the memmove instruction which is a linear time operation. The cost of a shift (asymptotically) is directly proportional to the number of elements that are shifted.

Just after the map is resized, its capacity is n and the number of elements inside it is n/2 + 1. When another block is added to it, the elements inside it are shifted by half of the remaining distance from the last element to the end of the map array, that is to say, it is shifted by half of n/2 - 1, rounded up. This leaves exactly n/4 - 1 free slots at both ends of the map. When the free space at the front is filled up, the elements are shifted again, by the same formula as before.

That is to say, each time the shift happens, the amount of free space between the last element and the end of the array is reduced by half (rounded up). The total number of shifts, therefore, is on the order of log2(n/2) which is equal to log2(n) - 1.

So now that we have the number of shifts, we just need to calculate the cost of each shift.

The first shift moves n/2 + 1 elements, the second shift moves n/2 + n/4 + 1, the third shift moves n/2 + n/4 + n/8 + 1 elements and so on. To calculate a lower bound, we'll simply assume that each shift moves exactly n/2 elements.

So we have: the number of shifts is log2(n) - 1 and each shift has a cost of n/2.

The total cost is:

(log2(n) - 1) * n/2 
= 0.5 * n * (log2(n) - 1) 
= 0.5 * n * log2(n) - 0.5 * n

To compute the amortized time complexity of the push_front operation, we divide the total cost by the number of times we called push_front, which is n * block_size times (we called it n * block_size times to insert n blocks). Thus we have that the amortized cost is:

0.5 * n * (log2(n) - 1) / (n * block_size)
= 0.5 * (log2(n) - 1) / block_size
= (0.5 * log2(n) - 0.5) / block_size

Now, to get the Big Omicron, we get rid of the constants and the constant factors, and we replace log2 with log (since log base doesn't matter in Big O analysis), in order to arrive at the final result:

O(log n)

And that is the amortized time complexity of the push_front operation in the libc++ deque implementation. Do note that this is a lower bound.

I have posted a more detailed explanation of how the libc++ implementation of the STL deque push_front here, if more context is required: https://1f604.blogspot.com/2021/11/the-libc-implementation-of-stl-deque.html

Is my analysis correct? If the amortized time complexity of the libc++ implementation of the STL deque is indeed O(log n), does that mean that it's not standards-compliant?

Have you checked / tested any of the other Standard Libraries ? — Richard Critten, Nov 28 '21 at 13:26
deque.modifiers goes even further with "Inserting a single element at either the beginning or end of a deque always takes constant time and causes a single call to a constructor of T." — , Nov 28 '21 at 13:27
@RichardCritten I only looked at the libc++ and libstdc++ implementations of the push_front function. I analyzed both of them and found that the libc++ implementation had amortized O(log n) time complexity while the libstdc++ implementation had amortized O(1) time complexity. — 1f604, Nov 28 '21 at 13:34
A shorter summary could be "if you remove the amortization from *amortized constant* you get something that's not constant". Fewer words. — Stephen M. Webb, Nov 28 '21 at 13:41
@1f604 Ohh, libc++. Then I'm not surprised if they're doing something funky. Don't get your hopes up though. I [reported](https://bugs.llvm.org/show_bug.cgi?id=20837) that their `std::sort` is **quadratic** and provided a patch *seven* years ago, and it only got fixed this week. — orlp, Nov 28 '21 at 13:46
libc++ has a [history](https://stackoverflow.com/questions/24475056/is-libcs-implementation-of-stdmake-heap-nonconformant) of this. — n. m. could be an AI, Nov 28 '21 at 14:04

Is the libc++ implementation of the STL deque push_front function standards-compliant?

Time complexity analysis

0 Answers0