3

Recently I saw the following result regarding the insertion of elements into the front of std::list and std::vector:

  • for std::list the time is linear
  • for std::vectorthe time polynomial
  • but for a small number of elements (<800) std::vector beats list

https://github.com/CppCon/CppCon2022/blob/main/Presentations/Refresher-on-Containers-Algorithms-and-Performance.pdf

src: https://github.com/CppCon/CppCon2022/blob/main/Presentations/Refresher-on-Containers-Algorithms-and-Performance.pdf

I tried to reproduce these results with google benchmark.

#include <benchmark/benchmark.h>

static void bm_vec(benchmark::State &state) {
    for (auto _ : state)
    {
        state.PauseTiming();
        std::vector<int> v ;
        benchmark::DoNotOptimize(v.data());
        state.ResumeTiming();
        for (int i = 0; i < state.range(0); i++)
        {
           v.insert(v.begin(), 1);
        }
        benchmark::ClobberMemory();
    }
}

BENCHMARK(bm_vec)->DenseRange(10,2000,10);

static void bm_list(benchmark::State &state) {
    
    for (auto _ : state)
    {
        state.PauseTiming();
        std::list<int> v ;
        benchmark::DoNotOptimize(v.front());
        state.ResumeTiming();
        for (int i = 0; i < state.range(0); i++)
        {
            v.insert(v.begin(), 1);
        }
        
    }
}
BENCHMARK(bm_list)->DenseRange(10,2000,10);

BENCHMARK_MAIN();

But my results did not reproduce the desired effect: very bad results

Does anyone know what I need to improve to get the desired result?

orfvl
  • 111
  • 5
  • 2
    Which compile flags are you using? In the presentation, they use -o3. Also, which hardware are you using? They use 8 core CPU. – Mickaël C. Guimarães Dec 16 '22 at 14:21
  • Try put `benchmark::DoNotOptimize(v.data());` in inner `for` loop. – Marek R Dec 16 '22 at 14:26
  • My hardware is: Run on (8 X 4700 MHz CPU s) CPU Caches: L1 Data 48 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 1280 KiB (x4) L3 Unified 12288 KiB (x1) – orfvl Dec 16 '22 at 18:16
  • I just saw now that your graph is using nanoseconds (ns) while the original source uses microseconds (us) as reference. If you account for the unit difference, your graph looks like the reference. – Mickaël C. Guimarães Dec 16 '22 at 19:34
  • putting `benchmark::DoNotOptimize(v.data());` in the inner for loop did not help. It made the results even less smooth – orfvl Dec 16 '22 at 19:35
  • Even if you account for the unit difference, the desired behavior (for n < 800 vector faster then list) is not observed – orfvl Dec 16 '22 at 19:38

1 Answers1

2

Setting O3( add_compile_options(-O3) ) solved the problem. enter image description here

orfvl
  • 111
  • 5