Reproducing performance results for vector insertion with google benchmark

Question

Recently I saw the following result regarding the insertion of elements into the front of std::list and std::vector:

for std::list the time is linear
for std::vectorthe time polynomial
but for a small number of elements (<800) std::vector beats list

src: https://github.com/CppCon/CppCon2022/blob/main/Presentations/Refresher-on-Containers-Algorithms-and-Performance.pdf

I tried to reproduce these results with google benchmark.

#include <benchmark/benchmark.h>

static void bm_vec(benchmark::State &state) {
    for (auto _ : state)
    {
        state.PauseTiming();
        std::vector<int> v ;
        benchmark::DoNotOptimize(v.data());
        state.ResumeTiming();
        for (int i = 0; i < state.range(0); i++)
        {
           v.insert(v.begin(), 1);
        }
        benchmark::ClobberMemory();
    }
}

BENCHMARK(bm_vec)->DenseRange(10,2000,10);

static void bm_list(benchmark::State &state) {
    
    for (auto _ : state)
    {
        state.PauseTiming();
        std::list<int> v ;
        benchmark::DoNotOptimize(v.front());
        state.ResumeTiming();
        for (int i = 0; i < state.range(0); i++)
        {
            v.insert(v.begin(), 1);
        }
        
    }
}
BENCHMARK(bm_list)->DenseRange(10,2000,10);

BENCHMARK_MAIN();

But my results did not reproduce the desired effect:

Does anyone know what I need to improve to get the desired result?

Which compile flags are you using? In the presentation, they use -o3. Also, which hardware are you using? They use 8 core CPU. — Mickaël C. Guimarães, Dec 16 '22 at 14:21
Try put `benchmark::DoNotOptimize(v.data());` in inner `for` loop. — Marek R, Dec 16 '22 at 14:26
My hardware is: Run on (8 X 4700 MHz CPU s) CPU Caches: L1 Data 48 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 1280 KiB (x4) L3 Unified 12288 KiB (x1) — orfvl, Dec 16 '22 at 18:16
I just saw now that your graph is using nanoseconds (ns) while the original source uses microseconds (us) as reference. If you account for the unit difference, your graph looks like the reference. — Mickaël C. Guimarães, Dec 16 '22 at 19:34
putting `benchmark::DoNotOptimize(v.data());` in the inner for loop did not help. It made the results even less smooth — orfvl, Dec 16 '22 at 19:35
Even if you account for the unit difference, the desired behavior (for n < 800 vector faster then list) is not observed — orfvl, Dec 16 '22 at 19:38

orfvl · Answer 1 · 2022-12-16T21:18:41.610

2

Setting O3( add_compile_options(-O3) ) solved the problem.

edited Dec 16 '22 at 21:18

answered Dec 16 '22 at 19:51

orfvl

111
5

Reproducing performance results for vector insertion with google benchmark

1 Answers1