bernoulli_distribution vs uniform_int_distribution

Question

In comparing bernoulli_distribution's default constructor (50/50 chance of true/false) and uniform_int_distribution{0, 1} (uniform likely chance of 0 or 1) I find that bernoulli_distributions are at least 2x and upwards of 6x slower than uniform_int_distribution despite the fact that they give equivalent results.

I would expect bernoulii_distribition to perform better due to it being specifically designed for the probability of only two outcomes, true or false; yet, it doesn't.

Given the above and the below performance metrics, are there practical uses of bernoulli distributions over uniform_int_distributions?

Results over 5 runs (Release mode, x64-bit): (See edit below for release runs without the debugger attached)

bernoulli: 58 ms
false: 500690
true: 499310

uniform: 9 ms
1: 499710
0: 500290
----------
bernoulli: 57 ms
false: 500921
true: 499079

uniform: 9 ms
0: 499614
1: 500386
----------
bernoulli: 61 ms
false: 500440
true: 499560

uniform: 9 ms
0: 499575
1: 500425
----------
bernoulli: 59 ms
true: 498798
false: 501202

uniform: 9 ms
1: 499485
0: 500515
----------
bernoulli: 58 ms
true: 500777
false: 499223

uniform: 9 ms
0: 500450
1: 499550
----------

Profiling code:

#include <chrono>
#include <random>
#include <iostream>
#include <unordered_map>

int main() {

    auto gb = std::mt19937{std::random_device{}()};
    auto bd = std::bernoulli_distribution{};
    auto bhist = std::unordered_map<bool, int>{};

    auto start = std::chrono::steady_clock::now();
    for(int i = 0; i < 1'000'000; ++i) {
        bhist[bd(gb)]++;
    }
    auto end = std::chrono::steady_clock::now();
    auto dif = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);

    std::cout << "bernoulli: " << dif.count() << " ms\n";
    std::cout << std::boolalpha;
    for(auto& b : bhist) {
        std::cout << b.first << ": " << b.second << '\n';
    }
    std::cout << std::noboolalpha;
    std::cout << '\n';

    auto gu = std::mt19937{std::random_device{}()};
    auto u = std::uniform_int_distribution<int>{0, 1};
    auto uhist = std::unordered_map<int, int>{};

    start = std::chrono::steady_clock::now();
    for(int i = 0; i < 1'000'000; ++i) {
        uhist[u(gu)]++;
    }
    end = std::chrono::steady_clock::now();
    dif = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);

    std::cout << "uniform: " << dif.count() << " ms\n";
    for(auto& b : uhist) {
        std::cout << b.first << ": " << b.second << '\n';
    }
    std::cout << '\n';
}

EDIT

I re-ran the test without debugging symbols attached and bernoulli still ran a good 4x slower:

bernoulli: 37 ms
false: 500250
true: 499750

uniform: 9 ms
0: 500433
1: 499567
-----
bernoulli: 36 ms
false: 500595
true: 499405

uniform: 9 ms
0: 499061
1: 500939
-----
bernoulli: 36 ms
false: 500988
true: 499012

uniform: 8 ms
0: 499596
1: 500404
-----
bernoulli: 36 ms
true: 500425
false: 499575

uniform: 8 ms
0: 499974
1: 500026
-----
bernoulli: 36 ms
false: 500847
true: 499153

uniform: 8 ms
0: 500082
1: 499918
-----

Try using a `std::uniform_real_distribution` and checking if the result is < 0.5. `std::bernoulli_distribution` by default gives equal weight to both outcomes, but doesn't have to. That might be where the extra complexity is coming from. — Kevin, Apr 14 '21 at 19:44
Cannot reproduce on a clang-800.0.42.1, they are either the same, or uniform is slightly slower, tested with both -O0 and -O3. — atru, Apr 14 '21 at 19:48
@atru I cannot comment on clang or GCC. I did not use them. I used MSVC. I re-ran the tests without debugging. See edit. — Casey, Apr 14 '21 at 20:31
Curiously enough, I tried it on a Linux system with a much newer compiler, gcc 9.3.0, and started to see Bernoulli being slower, by like 50%. — atru, Apr 14 '21 at 22:39

Kevin · Answer 1 · 2021-04-14T20:50:22.953

2

A default constructed std::bernoulli_distribution gives equal weight to both outcomes, but you can give it a different distribution parameter to change the probabilities. That might cause extra complexity. A better comparison would be to use a std::uniform_real_distribution<double> and compare its result to 0.5 (by default it gives a random number in the range [0, 1)).

See here for an example:

gcc output:

bernoulli: 28 ms
false: 499818
true: 500182

uniform: 31 ms
1: 500686
0: 499314

real: 29 ms
1: 500191
0: 499809

clang output:

bernoulli: 106 ms
false: 500662
true: 499338

uniform: 23 ms
1: 501263
0: 498737

real: 101 ms
1: 499683
0: 500317

The results are about the same using gcc (multiple runs tend to give uniform int a higher time, contrary to what you saw). With clang I get bernoulli and real to be about the same, with uniform int being much less time. Both of these are using -O3.

edited Apr 14 '21 at 20:50

answered Apr 14 '21 at 19:54

Kevin

6,993
1
15
24

Default constructor is [standard-guaranteed](https://en.cppreference.com/w/cpp/numeric/random/bernoulli_distribution/bernoulli_distribution) to give equal weight. Where do you suppose "...but doesn't have to"? – Casey Apr 14 '21 at 20:26
@Casey Sorry, I just meant that a default constructed one gives equal probabilities but that you can customize it with a different parameter (i.e. any given `std::bernoulii_distribution` object *doesn't have to* give equal weight to both outcomes). I edited my answer to make that more clear. – Kevin Apr 14 '21 at 20:51
Just tested `uniform_real_distribution(0.0f, nextafter(1.0f, 20.f))` (to account for urd being a half-closed range) vs `bernoulli_distribution` and bernoulli is faster by about 20% regardless of the probability. So speed-wise: `bernoulli < urd` depending on the application: yes-no probability ("IsPercentChance(float probability)"), bernoulli is faster; pure "random bool", uniform_int is faster. – Casey Apr 16 '21 at 02:14

score 1 · Answer 2 · answered Apr 14 '21 at 19:52

The class bernoulli_distribution is used to generate boolean with possible uneven ratios. To achieve that it has to generate a floating point in [0,1] range and then compare it versus the given probability. Or anything equivalent.

It is rather obvious that this routine is likely to be slower than taking a random integer modulo 2 - which is pretty much all it takes to create a uniform number in {0,1} from a random number.

How is it surprising? Only if compiler somehow manages to figure out unnecessary operations while being aware that it is 50/50 during compilation can the performance match up to even.

I think this may be the major reason given the hurdles of processing floats and the complexity compared to simple integer operations. — atru, Apr 15 '21 at 13:34

Casey · Accepted Answer · 2021-04-16T22:47:52.533

Some comments and answers suggest using uniform_real_distribution instead.

I tested uniform_real_distribution(0.0f, nextafter(1.0f, 20.f)) (to account for urd being a half-closed range) vs bernoulli_distribution and the bernoulli_distribution is faster by about 20%-25% regardless of the probability (and gave more correct results. I tested 1.0 true probability and my implementation that used the above urd values actually gave false negatives (granted one or two out of 5 one-million runs) and bernoulli gave the correct none.

So, speed-wise: bernoulli_distribution is faster than uniform_real_distribution but slower than uniform_int_distribution.

Long-story short, use the right tool for the job, don't reinvent the wheel, the STL is well-built, etc. and depending on the use-case one is better than the other.

For yes-no probability (IsPercentChance(float probability)), bernoulli_distribution is faster and better.

For pure "give me a random random bool value", uniform_int_distribution is faster and better.

bernoulli_distribution vs uniform_int_distribution

3 Answers3