Why do I get a constant instead of logarithmic curve for an insert time benchmark of the RB-tree based C++ std::set?

Question

I was comparing BST to Heap at: Heap vs Binary Search Tree (BST) but when I tried to benchmark both and compare results, I could not interpret the data for BST.

First, I confirmed that the standard library does use a Red-black tree: What is the underlying data structure of a STL set in C++?

Then I ran this benchmark.

main.cpp

#include <chrono>
#include <iostream>
#include <random>
#include <set>

int main(int argc, char **argv) {
    size_t i, n;
    std::set<int> bst;
    std::random_device dev;
    unsigned int seed = dev();
    std::mt19937 prng(seed);
    std::uniform_int_distribution<> dist;

    if (argc > 1) {
        n = std::stoi(argv[1]);
    } else {
        n = 1000000;
    }
    for (i = 0; i < n; ++i) {
        auto random_value = dist(prng);
        auto start = std::chrono::steady_clock::now();
        bst.insert(random_value);
        auto end = std::chrono::steady_clock::now();
        auto dt_bst = end - start;
        std::cout << random_value << " "
            << std::chrono::duration_cast<std::chrono::nanoseconds>(dt_bst).count() << std::endl;
    }
}

plot.gnuplot:

#!/usr/bin/env gnuplot
set terminal png size 1024, 1024
set output "bst_vs_heap.png"
set title "BST insert time"
set xlabel "size"
set ylabel "nanoseconds"
plot "main.dat" using 2 notitle

Compile, run and plot:

g++ -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic-errors -o main.out main.cpp
./main.out 10000000 > main.dat
./plot.gnuplot

Outcome:

Why don't I see a nice logarithmic curve, as in the theoretical data structure, but rather a somewhat constant line with some outliers?

Ubuntu 18.04, GCC 7.3, Intel i7-7820HQ CPU, DDR4 2400 MHz RAM, Lenovo Thinkpad P51.

One million iterations will only get you to 19 layers deep into a balance binary search tree. Hardly worth sneezing about. — Robert Harvey, Aug 21 '18 at 15:56
It looks to me like you're looking at the inserted value (`std::cout << random_value`), not the size of the set. But I'm not a gnuplot expert. — molbdnilo, Aug 21 '18 at 15:58
@molbdnilo I'm couting both, and taking the second column to plot. — Ciro Santilli OurBigBook.com, Aug 21 '18 at 16:01
@CiroSantilli新疆改造中心六四事件法轮功 Ah, yes, "using 2". Of course. — molbdnilo, Aug 21 '18 at 16:03
I don't speak gnuplot, where does the x axis data come from? I would expect to have `<< bst.size()` somewhere in your output. I don't think its a good idea to skip the iterations where the candidate value was already present. — Caleth, Aug 21 '18 at 16:06
@Caleth `using 2` with a single number leads makes the `x` count from 0 to n: https://stackoverflow.com/questions/9664472/having-automatic-x I have tested with minimal examples. — Ciro Santilli OurBigBook.com, Aug 21 '18 at 16:08
@RobertHarvey that makes sense, but still, why don't I see my logarithm? Context switching noise? Or the clock is not precise enough? — Ciro Santilli OurBigBook.com, Aug 21 '18 at 16:09
It is possible that randomization and the clocks themselves are taking more than the median time for insert. A possible solution is to 1) create a vector of randomized values before the test, and call `bst.insert` while iterating the vector; 2) Measure the overhead of `now()`. If you know how much time `now()` takes, you can subtract that. — Michael Veksler, Aug 21 '18 at 16:10
you (probably) aren't counting nanoseconds. Have a look at your implementation's `std::chrono::steady_clock::duration`, you probably want `std::chrono::high_resolution_clock` instead — Caleth, Aug 21 '18 at 16:10
@MichaelVeksler I don't see how the randomization calculation can matter on this case since I calculate the value before the first `now()`. But true, the `now()` itself could be the problem. — Ciro Santilli OurBigBook.com, Aug 21 '18 at 16:12
It seems like you're showing that the time to insert an item isn't dependent on the value of the item inserted. I thought you were trying to determine how the insert-time depended on the size of the container (for that I'd expect you to be outputting the value of `i` and using that as the x-axis). — Tim Randall, Aug 21 '18 at 16:33
@TimRandall I was kind of doing both, especially for the heap, `i` for `x` comes from gnuplot automatically as mentioned on previous reply. — Ciro Santilli OurBigBook.com, Aug 21 '18 at 16:35
I have run your benchmarks, but without an actual `insert`. I have got most of the time `0`, but 160,000 were 293 nanoseconds, just for calling `now`. Also there are odd random samples going from 586ns to 51,000ns. This can be the cause of some of the noisy dots you have at the upper side of the graph. — Michael Veksler, Aug 21 '18 at 16:41
You are measuring the noise of whatever system call is used by the clock. For an accurate measurement, do many insertions in a loop. Start the clock before the loop and stop it after the loop. — n. m. could be an AI, Aug 21 '18 at 16:44
@MichaelVeksler that"s what happens when you get distracted for 20 min when writing a comment, it becomes obsolete. — n. m. could be an AI, Aug 21 '18 at 17:01
Just wanna say, dude. Stellar presentation and example. Really well done. I hope this gets upticks o'festival in the years to come. — WhozCraig, Aug 22 '18 at 07:30
@WhozCraig thank you Craig! One day I will get back to: https://github.com/cirosantilli/write-free-science-books-to-get-famous-website and save the world. — Ciro Santilli OurBigBook.com, Aug 22 '18 at 08:44

score 10 · Answer 1 · answered Aug 21 '18 at 16:30

The clock is likely not accurate enough as mentioned in the comments, so I've tried to group a bunch of inserts together and time them to improve the signal to noise, and it worked, I can see a logarithm now:

#include <chrono>
#include <iostream>
#include <random>
#include <set>

int main(int argc, char **argv) {
    size_t i, j, n, granule;
    std::set<int> bst;
    std::random_device dev;
    unsigned int seed = dev();
    std::mt19937 prng(seed);
    std::uniform_int_distribution<> dist;
    int *randoms;

    if (argc > 1) {
        n = std::stoi(argv[1]);
    } else {
        n = 1000000;
    }
    if (argc > 2) {
        granule = std::stoi(argv[2]);
    } else {
        granule = 10;
    }
    randoms = new int[granule];
    for (i = 0; i < n / granule; ++i) {
        for (j = 0; j < granule; ++j) {
            randoms[j] = dist(prng);
        }
        auto start = std::chrono::high_resolution_clock::now();
        for (j = 0; j < granule; ++j) {
            bst.insert(randoms[j]);
        }
        auto end = std::chrono::high_resolution_clock::now();
        auto dt_bst = end - start;
        std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(dt_bst).count() << std::endl;
    }
    delete[] randoms;
}

Command:

./main.out 100000000 10000

Graph:

This is more or less what is expected by theory. It can be smoothed out even further by collecting the timing in a vector, rather than directly sending it to `std::cout`. Then repeat the experiment N times, starting each time with the same random seed. Sum up all the results of each measurement point. In the process, consider throwing away the min/max points for each insertion (to minimize noise). At the end, print the values in the vector divided by N (or by N-2, if outliers are eliminated). — Michael Veksler, Aug 21 '18 at 16:56
@scry, so that it will measure exactly the same insertions into exactly the same bst multiple times. The idea is to get rid of noise that comes from other tasks, the operating system, and timer inaccuracies, for a given insertion in a given tree. — Michael Veksler, Aug 22 '18 at 06:58

Why do I get a constant instead of logarithmic curve for an insert time benchmark of the RB-tree based C++ std::set?

1 Answers1

Linked