7

I have encountered some odd behavior with std::binomial_distribution when compiling with clang++ (with libstdc++ standard library).

Consider the following simple program:

#include <ctime>
#include <random>
#include <iostream>

unsigned binom(unsigned n, double p) {
  std::mt19937 e(time(NULL));
  std::binomial_distribution<unsigned> b(n, p);
  return b(e);
}

int main() {

  std::cout << "sample1=" << binom(1073741823, 0.51174692866744709) << "\n";
  std::cout << "sample2=" << binom(1073741824, 0.51174692866744709) << "\n";

}

This program will output one line (sample1=511766586\n) and then hang indefinitely.

Have I somehow invoked undefined behavior? This appears to happen regardless of what the PRNG returns. No matter how I seed it my main hangs on this second line.

Floyd Everest
  • 268
  • 2
  • 11
  • 2
    BTW, you DON'T want to reinitialize the mersenne twister every time: https://godbolt.org/z/devvx3hPz – Bob__ Jan 20 '23 at 00:13
  • 1
    Unrelated: Avoid using time as the seed for anything important. Anyone who knows when the program started can predict the outputs. The standard recommendation is to use a [`std::random_device`](https://en.cppreference.com/w/cpp/numeric/random/random_device). Hmmm someone seems to have screwed up or vandalized the doc page and renamed it to std::promise. Content looks correct, though. – user4581301 Jan 20 '23 at 00:15
  • Thank you both for your concern, but yes I would never do either of these in practice! – Floyd Everest Jan 20 '23 at 00:17
  • 2
    Here a [mre]: https://godbolt.org/z/vvq7T6WxP – Bob__ Jan 20 '23 at 00:25
  • 1
    I can reproduce with GCC 12.2. Infinite rejection loop in `binomial_distribution::operator()`. Lacking a MSVC at the moment to complete the sweep of the big three. I'm going to have to leave this to someone who actually knows how this smurf works. – user4581301 Jan 20 '23 at 00:27
  • @Bob__ Thank you for posting the mre. Definitely seems to be an issue within libstdc++. – Floyd Everest Jan 20 '23 at 00:28
  • @user4581301 Unrelated, but what is an MSVC? – Floyd Everest Jan 20 '23 at 00:30
  • 1
    Probably related: https://stackoverflow.com/questions/13163305/c11-stls-binomial-distribution-extremely-slow . Also, MSVC stands for Microsoft Visual Studio. – Bob__ Jan 20 '23 at 00:40
  • 1
    Apologies for using an out of date acronym. **M**icro**S**oft **V**isual **C**. – user4581301 Jan 20 '23 at 00:40
  • Bob__ raises a good point. I didn't let the loop run long enough to prove it really was infinite. – user4581301 Jan 20 '23 at 00:41
  • @Bob__ I noticed that if you sample from a binomial distribution for `n+10, n+9, n+8, ..., n+1`, that the execution time does increase significantly at each step. However, I've been running this program on a node in our HPC facility for quite a while now. Well over 15 minutes at this point, which far exceeds any of the other execution times. – Floyd Everest Jan 20 '23 at 01:01

1 Answers1

6

I debugged the GCC implementation of binomial_distribution (_M_initialize, operator()), and this is what I found:

Because of an overflow of the unsigned parameter n (2^30) the variable _M_s2 of the __param object becomes inf and therefore the same happens to __s2s, _M_s of the same object and __u, __a12, __y in operator()

This leads to the following infinite loop in operator()

bool __reject;
do
{
    if (__u <= __a1)                        inf <= val --> false
    {
        [...]
    }
    else if (__u <= __a12)                  inf <= inf --> true
    {
        __reject = __y >= [...];            inf >= val --> true
    }
    __reject = __reject || [...];           true || bool --> true
    __reject |= [...];                      true | val --> truthy
}
while(__reject);

Here is a (partial) traceback of how those variables got to equal inf:

_M_t = n

_M_s2 = [...] * (1 + [double] / (4 * _M_t * [...]));
                                 ^^^^^^^^ overflow == 0
                     [double] / 0 == inf

__s2s = _M_s2 * _Ms2;

_M_s = [...] + __s2s * [...];


__u = __param._M_s * [...];

__a12 = [...] + __param._M_s2 * [...];

__y = __param._M_s2 * [...];

Is also worth noting that __d2x in the __param object is NaN and that the other variables that contribute to this process, of which I omitted the definition, have (at least in this context) valid values

A feasible solution (until the bug is fixed) would be using std::binomial_distribution<unsigned long> (or uint64_t) in place of unsigned

Pignotto
  • 555
  • 3
  • 11
  • Wow, nice work! How did you go about debugging this? – Floyd Everest Jan 20 '23 at 05:24
  • @FloydEverest thank you! It wasn't easy... The first helpful thing I found was actually `__d2x` being NaN, then I noticed a `32 * _M_t` on the line it was defined. From there I found the `_M_s2` and traced everything down – Pignotto Jan 20 '23 at 06:09
  • size_t instead of unsigned long? unsigned long is 32bit on windows – Severin Pappadeux Jan 26 '23 at 15:25
  • 1
    @SeverinPappadeux Yes, you're right, even better would be `uint64_t` – Pignotto Jan 27 '23 at 07:00
  • I had a similar experience - the GCC boys were very reactive on fixing overflows in the STL. https://stackoverflow.com/questions/73157920/undefined-behavior-according-to-clang-fsanitize-integer-on-libstdc-stdran Although there was a clear underflow in the code, they said the bug was in the C++ standard, not in libstdc++. – Something Something Jan 27 '23 at 07:12
  • "until the bug is fixed": did someone make a report on gcc's bugzilla? – Marc Glisse Jan 27 '23 at 07:17