12

I started doing comparisons between:

  • inserting at the front of list
  • inserting at the back of a vector
  • inserting at the front of a deque

But then I noticed that even on push_back() the deque seemed to be faster. I must be doing something wrong, I can't believe a more general container would outperform a particular one.

My code using google benchmark:

#include "benchmark/benchmark.h"
#include <deque>
#include <vector>

#define NUM_INS 1000

static void BM_InsertVector(benchmark::State& state) {
    std::vector<int> v;
    v.reserve(NUM_INS);
    while (state.KeepRunning()) {
        state.PauseTiming();
        v.clear();
        state.ResumeTiming();
        for (size_t i = 0; i < NUM_INS; i++)
            v.push_back(i);
    }
}
BENCHMARK(BM_InsertVector);

static void BM_InsertDeque(benchmark::State& state) {
    std::deque<int> v;
    while (state.KeepRunning()) {
        state.PauseTiming();
        v.clear();
        state.ResumeTiming();
        for (size_t i = 0; i < NUM_INS; i++)
            v.push_back(i);
    }
}
BENCHMARK(BM_InsertDeque);

BENCHMARK_MAIN();

Results:

Run on (1 X 2592 MHz CPU )
2016-02-18 14:03:47
Benchmark         Time(ns)    CPU(ns) Iterations
------------------------------------------------
BM_InsertVector       2820       2470     312500                                 
BM_InsertDeque        1872       1563     406977

I notice some differences when playing with the number of elements, but deque always outperforms vector.

EDIT: compiler: gcc version 5.2.1 compiling with: g++ -O3 -std=c++11 push_front.cpp -lbenchmark -lpthread

I think the -O3 is actually instrumental; when I turn it off I get a slightly worse deque performance.

Iosif Spulber
  • 405
  • 3
  • 11
  • 3
    I've youve reserved enough capacity for the vector, the vector is faster. – Viktor Sehr Feb 18 '16 at 14:14
  • 4
    @RHertel If you use plain `std::list`, I'd expect insertion to be *slower*, because each insertion will allocate a new node. – dyp Feb 18 '16 at 14:15
  • @RHertel list seems to be slower; not shocking, that's why I didn't mention it. – Iosif Spulber Feb 18 '16 at 14:16
  • @ViktorSehr So what am I doing wrong then? – Iosif Spulber Feb 18 '16 at 14:17
  • @dyp As described in "Accelerated C++" by Koenig and Moo "In lists, push_back and erase operations do not invalidate iterators to other elements.(...) Using push_back to append an element to a vector invalidates all iterators referring to that vector". The authors also provide spectacular differences in the benchmarks. – RHertel Feb 18 '16 at 14:18
  • @RHertel What does iterator invalidation have to do with insert speed? (Genuine question!) – underscore_d Feb 18 '16 at 14:19
  • @RHertel: Anyway, that quote is simply wrong. As long as no reallocation is required (new `size` <= `capacity`), only the `end` iterator is invalided (for obvious reasons) by `vector.*_back` insertions; iterators to prior existing data will remain valid since those elements still exist at the same addresses. – underscore_d Feb 18 '16 at 14:26
  • Alright. I admit that I don't know about the nitty-gritty details of the implementation of the containers. What I do know is that the authors are highly respected in the community. The quote I was referring to is in section 5.5.1 of that book. – RHertel Feb 18 '16 at 14:29
  • 1
    What compiler do you use? Compilation settings? – Slava Feb 18 '16 at 14:30
  • @RHertel In this case, it's nothing to do with the implementation and everything to do with what the Standard mandates. Someone can be highly respected and still be wrong. http://stackoverflow.com/a/14820933/2757035 – underscore_d Feb 18 '16 at 14:31
  • 1
    @MarcoGiordano OP calls reserve, there is no reallocation involved – Slava Feb 18 '16 at 14:32
  • @MarcoGiordano Even if reallocation were an issue here, "each time you re-allocate, it doubles it size" - false. That's just one of a few commonly used _conventions_ for how a `vector` expands. It's in no way required/guaranteed. OP, I hope you get some useful comments soon... – underscore_d Feb 18 '16 at 14:35
  • @underscore_d In fact MSVS uses 1.5 – NathanOliver Feb 18 '16 at 14:39
  • @underscore_d agreed, is not in the standard, but often a common implementation, so rather then revel in details is often a safe model to use to reason about it. – Marco Giordano Feb 18 '16 at 14:39
  • 2
    OP We need to know how you compiled it. If optimizations were not on that could be part of the explanation. – NathanOliver Feb 18 '16 at 14:39
  • from http://en.cppreference.com/w/cpp/container/deque As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays. This may be a hint, with smaller chunk of memory you can have a more efficient insertion. But it is obviously only a speculation. – Alessandro Teruzzi Feb 18 '16 at 14:41
  • I've added the compiler version etc. as an EDIT. – Iosif Spulber Feb 18 '16 at 14:42
  • Regarding your edit, that's interesting. Can you output ASM of the relevant loops? You said `vector` is faster if you turn off `-O3`, but what does that mean? No optimisation at all? What about at lower (but non-zero) optimisation levels, or `-Os`? – underscore_d Feb 18 '16 at 15:04
  • 2
    I can't reproduce your results with gcc-4.9. For me, `vector` outperforms `deque` at every optimization level other than `O0`. Are you sure you're not doing something silly like compiling the wrong file? The command line you post above says you're compiling `push_front.cpp`, but you're testing `push_back`. – Praetorian Feb 18 '16 at 15:11
  • @Praetorian that's just the file name, it's the same file though. Are you using google benchmark as well? – Iosif Spulber Feb 18 '16 at 15:16
  • 1
    @IosifSpulber Yes. And I copy-pasted your code as is. At O3 I see vector=2814ns, deque=3145ns. – Praetorian Feb 18 '16 at 15:25
  • @Praetorian That... is really weird. – Iosif Spulber Feb 18 '16 at 17:37

2 Answers2

3

There are basically 3 sources of cost involved in continuously appending elements to a dynamic container:

  1. Memory management.
  2. The internal bookkeeping of the container.
  3. Any operations that need to be performed on the elements themselves. Notably; any container that invalidates references on insertion is potentially moving/copying elements around.

Let's start with 1. vector keeps asking for double the memory, and deque allocates fixed sized chunks (deque is typically implemented as an array of arrays, with the lower tier arrays being of fixed size). Asking for more memory may take longer than asking for less, but typically unless your heap is very fragmented asking for a big chunk all at once is the fastest way to get some memory. It's probably faster to allocate one meg once, then ask for a kilobyte 1000 times. So it seems clear that vector will eventually have the advantage here (until the container is so large it's affected by fragmentation). However, this isn't eventually: you asked for only 1000 elements. I wrote the following code http://coliru.stacked-crooked.com/a/418b18ff8a81c1c0. It's not very interesting but it basically uses a trivial allocator that increments a global to see how many allocations are performed.

In the course of your benchmark, vector asks for memory 11 times, and deque only 10. deque keeps asking for the same amount, vector asks for doubling amounts. As well, vector must be calling free 10 times. And deque 0. This seems like a small win for deque.

For internal bookkeeping, vector has a simpler implementation than deque. After all, vector is just a dynamic array, and deque is an array of arrays and is strictly more complex. So this is clearly a win for vector.

Finally, elements on the operations themselves. In deque, nothing is ever moved. With vector, every new heap allocation also involves moving all the elements. It's probably optimized to use memcpy for trivial types, but even see, that's 10 calls to memcpy to copy 1, 2, 4, 8 ... 512 integers. This is clearly a win for deque.

I can speculate that cranking up to O3 allowed very aggressive inlining of a lot of the more complex codepaths in deque, reducing the weight of 2. But obviously, unless you do a much more detailed (very careful!) benchmark, you'll never know for sure.

Mostly, this post is to show that it's more complex than simply a specialized container vs a more general one. I will make a prediction though (put my neck out to be cut off, as it were): if you increase the number of elements by even say a factor of 2 or 4, you will not see deque win anymore. That's because deque will make 2x or 4x as many heap allocations, but vector will only make 1-2 more.

I may as well note here that deque is actually kind of an odd data structure; it's theoretically an array of arrays but in many implementations the array is either a certain size, or just one element, whichever is larger. Also, some of it's big O guarantees are nonsense. push_back is only fixed constant time, because in C++, only operations on the elements themselves count towards the big O. Otherwise it should be clear, that since it's an array of arrays, the top level array will be proportional in size to the number of elements already stored. And eventually that top level array runs out of room, and you have to reallocate it, moving O(N) pointers. So it's not really O(1) push_back.

Nir Friedman
  • 17,108
  • 2
  • 44
  • 72
0

I think the vector is slower because you're calling clear() which, depending on your STL implementation, may be freeing the underlying array storage.

If that's the case, then your reserve() call isn't helping; and your vector is continuously resizing, which requires every element to be moved to the new, larger, storage.

Dominic Dos Santos
  • 2,623
  • 2
  • 18
  • 30
  • [`vector.clear()` cannot modify `.capacity()`](https://stackoverflow.com/a/18467916/2757035), so it would make no sense for it to free the array, and I doubt any vaguely wise stdlib implementation would do that. – underscore_d Apr 22 '23 at 18:46