What is performance-wise the best way to generate random bools?

Question

I need to generate random Boolean values on a performance-critical path.

The code which I wrote for this is

std::random_device   rd;
std::uniform_int_distribution<> randomizer(0, 1);
const int val randomizer(std::mt19937(rd()));
const bool isDirectionChanged = static_cast<bool>(val);

But do not think that this is the best way to do this as I do not like doing static_cast<bool>.

On the web I have found a few more solutions

1. std::bernoulli_distribution

2. bool randbool = rand() & 1; Remember to call srand() at the beginning.

`std::bernoulli_distribution` is slow from my experience. The best way is to generate `unsigned long long` (for x64) and use its bits as boolean values. — Vitalii, Feb 12 '16 at 09:04
How much "randomness" do you need? After all, you can just declare uninitialised int and return its first bit. The value will be "random", but with unknown distribution. — Jakub Zaverka, Feb 12 '16 at 09:08
@JakubZaverka: that's *undefined* behaviour - no guarantee your program will work. — Tony Delroy, Feb 12 '16 at 09:08
Are you creating `rd` and `randomizer` every single loop or once? Adding `static` to `rd` and `randomizer` could help a lot if you didn't already do that. — nwp, Feb 12 '16 at 09:20
What about pre generating a (long enough) (circular) buffer of 64bit random values, from which to take very quickly one bit at a time when in need of a boolean random value? — Antonio, Feb 12 '16 at 09:20
Did you measure any of the options you mentioned? Because people will give you speculative answers, so you need to be able to gauge their performance and whether the resulting distributions are acceptable. — juanchopanza, Feb 12 '16 at 09:26
If you want to optimize for performance, you have to be more specific about your requirements. Does the performance need to be cpnsistent? True randomness or pseudorandom? What distribution? Can you show a benchmark, against we can measure? How fast is good enough (what other things are you doing in your loop)? You don't need to answer all of them, but some more information would narrow down the possible solutions. — MikeMB, Feb 12 '16 at 09:35
You have not stated what properties you require of your random boolean, so @n.m. is correct. — OrangeDog, Feb 12 '16 at 09:58
@Vitaliy do you ever measure how much it is slow? slitly or noticable — T M, Feb 12 '16 at 10:12
@TM Do nothing? `const bool b = randomizer(std::mt19937(rd()));` — juanchopanza, Feb 12 '16 at 10:18
@nwp yes creating every loop this is inside my function. Can you please provide more details about adding `static` to `rd`? — T M, Feb 12 '16 at 10:19
@TM I implemented image noise generator. I had to adjust pixel color with specific probability. Image size was 800x600, and program had to add noise to images that came from external source (up to 50 images per second). `std::bernoulli_distribution` was logical choise. CPU usage was big. I profiled with `gprof`. When I switched to `unsigned long long` from `std::bernoulli_distribution`, I got near 10 times performance improvement with clang 3.4.2 `default_random_engine` on CentOS 6. — Vitalii, Feb 12 '16 at 10:22
Creating a `std::random_device`, a `std::uniform_int_distribution` and a `std::mt19937` cost some performance due to the constructor and destructor running every time. If you put `static` in front of them they will be constructed only once and reused every time, so you save construction and destruction of these objects. — nwp, Feb 12 '16 at 10:23
@juanchopanza I thought this is more professional approach then just assigning `int` to `bool` — T M, Feb 12 '16 at 10:25
@JakubZaverka: "I don't know what the distribution is" and "this is random" are *completely* different things! Particularly when what you *need* is a random distribution. — Eric Lippert, Feb 12 '16 at 17:35
If it's performance critical then you may want to avoid `bool` altogether. It's a very sparse way to store the information if you have a lot of them, and if they're random then they're likely to stress branch prediction. It depends on your code, but the PRNG is unlikely to be the bottleneck. — sh1, Feb 12 '16 at 17:49
Related: http://stackoverflow.com/questions/7901829/making-use-of-sandy-bridges-hardware-true-random-number-generator — Serge Rogatch, Jul 01 '16 at 21:38

Serge Rogatch · Accepted Answer · 2016-02-12T12:53:01.207

40

For the purpose of performance, at a price of less "randomness" than e.g. std::mt19937_64, you can use Xorshift+ to generate 64-bit numbers and then use the bits of those numbers as pseudo-random booleans.

Quoting the Wikipedia:

This generator is one of the fastest generators passing BigCrush

Details: http://xorshift.di.unimi.it/ . There is a comparison table in the middle of the page, showing that mt19937_64 is 2 times slower and is systematic.

Below is sample code (the real code should wrap it in a class):

#include <cstdint>
#include <random>
using namespace std;

random_device rd;
/* The state must be seeded so that it is not everywhere zero. */
uint64_t s[2] = { (uint64_t(rd()) << 32) ^ (rd()),
    (uint64_t(rd()) << 32) ^ (rd()) };
uint64_t curRand;
uint8_t bit = 63;

uint64_t xorshift128plus(void) {
    uint64_t x = s[0];
    uint64_t const y = s[1];
    s[0] = y;
    x ^= x << 23; // a
    s[1] = x ^ y ^ (x >> 17) ^ (y >> 26); // b, c
    return s[1] + y;
}

bool randBool()
{
    if(bit >= 63)
    {
        curRand = xorshift128plus();
        bit = 0;
        return curRand & 1;
    }
    else
    {
        bit++;
        return curRand & (1<<bit);
    }
}

edited Feb 12 '16 at 12:53

answered Feb 12 '16 at 09:10

Serge Rogatch

13,865
7
86
158

1

XorShift without the plus passes almost all tests and should be a little more than twice as fast. This is an additional possible choice. XorShift(+) is far underutilized. Should be the standard generator on all platforms basically. Not sure why the super heavy weight Mersenne Twister is favored so much. – usr Feb 12 '16 at 10:23
1

@Serge Rogatch by using bitmasks? – T M Feb 12 '16 at 10:37
2

@usr Because pseudorandom number generation is extremely poorly understood by most programmers. – Thomas Feb 12 '16 at 10:43
You should probably take Jamie D implementation and substitute in your random generator – Antonio Feb 12 '16 at 12:52
@Antonio, please, refresh the page. I've added sample code that uses bits. – Serge Rogatch Feb 12 '16 at 12:54
8

@usr: Mersenne Twister is hardly "super heavy weight". The main reason it's more popular is because Xorshift+ is much newer than MT. Xorshift+ was first published in 2014, MT has been around since 1997. – Jack Aidley Feb 12 '16 at 13:08
@JackAidley I would call 2.5KB state size very heavy weight! This generator accomplishes basically the same quality as the 8 word-sized XorShift+.; `sequentially extracting the bits` this is one of the basic tests being performed by the various test suites.; Also, this (https://en.wikipedia.org/wiki/Mersenne_Twister#Disadvantages) reads terrible. I did not even know that. – usr Feb 12 '16 at 13:11
@usr I'm not that worried about 2.5 KB of memory being spent on a random number generator if it produces superior randomness. How's MT doing in terms of CPU usage? – John Dvorak Feb 12 '16 at 13:40
@JanDvorak can't be better than about 10 instructions per word of random data (XorShift). XorShift+ uses two such computations and adds them I believe, so ~21 instructions with some very nice ILP. – usr Feb 12 '16 at 13:49
@usr For the vast majority of real-world applications it just doesn't matter - neither the memory footprint nor the CPU overhead. It might be important for very specialized applications and embedded devices, but in both of those cases it is easy to replace the algorithm with something else. Standards take a long time to be ratified and you want to pick something that is well understood and has been used in the real-world for a while. Xorshift was only proposed in an academic paper in 2003 and really only became popular in the last few years. Give it a few more years and this will change. – Voo Feb 12 '16 at 17:54
Just found the `RdRand` CPU instruction, should be the fastest: http://stackoverflow.com/a/38154142/1915854 – Serge Rogatch Jul 01 '16 at 21:39

Emil Laine · Answer 2 · 2016-02-13T17:13:43.080

21

Some quick benchmarks (code):

   647921509 RandomizerXorshiftPlus
   821202158 BoolGenerator2 (reusing the same buffer)
  1065582517 modified Randomizer
  1130958451 BoolGenerator2 (creating a new buffer as needed)
  1140139042 xorshift128plus
  2738780431 xorshift1024star
  4629217068 std::mt19937
  6613608092 rand()
  8606805191 std::bernoulli_distribution
 11454538279 BoolGenerator
 19288820587 std::uniform_int_distribution

For those who want ready-to-use code, I present XorShift128PlusBitShifterPseudoRandomBooleanGenerator, a tweaked version of RandomizerXorshiftPlus from the above link. On my machine, it is about as fast as @SergeRogatch's solution, but consistently about 10-20% faster when the loop count is high (≳100,000), and up to ~30% slower with smaller loop counts.

class XorShift128PlusBitShifterPseudoRandomBooleanGenerator {
public:
  bool randBool() {
    if (counter == 0) {
      counter = sizeof(GeneratorType::result_type) * CHAR_BIT;
      random_integer = generator();
    }
    return (random_integer >> --counter) & 1;
  }

private:
  class XorShift128Plus {
  public:
    using result_type = uint64_t;

    XorShift128Plus() {
      std::random_device rd;
      state[0] = rd();
      state[1] = rd();
    }

    result_type operator()() {
      auto x = state[0];
      auto y = state[1];
      state[0] = y;
      x ^= x << 23;
      state[1] = x ^ y ^ (x >> 17) ^ (y >> 26);
      return state[1] + y;
    }

  private:
    result_type state[2];
  };

  using GeneratorType = XorShift128Plus;

  GeneratorType generator;
  GeneratorType::result_type random_integer;
  int counter = 0;
};

edited Feb 13 '16 at 17:13

answered Feb 12 '16 at 10:13

Emil Laine

41,598
9
101
157

Nice. The remaining question is whether the properties of the resulting distributions are acceptable. – juanchopanza Feb 12 '16 at 10:21
I think the constructor of BoolGenerator, with the random number generation, should be taken out of the measurement (it's a precomputation). BTW, I will see later if BoolGenerator could be further optimized by keeping in a separate variable a copy of the currently pointed element of the buffer. It would basically become like the "modified Randomizer", but with a buffer look up instead of a random number generation. – Antonio Feb 12 '16 at 11:31
@Antonio I took the initialization/seeding into account with all other solutions as well. – Emil Laine Feb 12 '16 at 12:00
One thing is seeding, another is precomputation of all random values. – Antonio Feb 12 '16 at 12:02
precomputation = generation, which is what this is measuring. In addition to seeding. – Emil Laine Feb 12 '16 at 12:03
@zenith That's not the point of the algorithm I proposed. In a time critical part of the code you need to have random numbers available. In an initialization phase (not time critical), you create these random variables, in the time critical phase you "only" access these values. – Antonio Feb 12 '16 at 12:05
3

Why don't you combine Randomizer with xorshift128plus instead of mt19938 ? Shouldn't that be by far the fastest possibility ? – Falco Feb 12 '16 at 12:36
1

@Falco Updated benchmark with combined Xorshift+ and Randomizer. – Emil Laine Feb 12 '16 at 14:43
@Antonio True. Of course the same strategy can be used with all other algorithms if needed. – Emil Laine Feb 12 '16 at 16:34
@zenith Sure, in fact I did not even specify which random generator I would use. What is left to understand is if a memory access to a pre-generated value could be faster then calling a XorshiftPlus. – Antonio Feb 12 '16 at 17:23
@Antonio Can you please make it compile first? – Emil Laine Feb 12 '16 at 22:43
@Antonio Added, I didn't include the version with an uninitialized buffer because it would invoke undefined behavior so it wouldn't make sense to benchmark that. – Emil Laine Feb 13 '16 at 17:17
@zenith It's ok, I think the time with the non-reinitialized buffer anyway shows that the access to memory is slower than generating a new number with xorshift... Unless, the problem is with the different bit selection technique I proposed. – Antonio Feb 13 '16 at 23:03
Could it be that the return value computation (with the bit shifting) is optimized away because the result is not used? In particular, this instruction might be skipped `(m_rand >> --counter) & 1;`. What if instead of `const bool isDirectionChanged = gen.generate();` we put [`isDirectionChanged = (gen.generate() != isDirectionChanged);`](http://stackoverflow.com/a/1596681/2436175) and print the final value of `isDirectionChanged`? – Antonio Feb 13 '16 at 23:40

Hatted Rooster · Answer 3 · 2016-02-12T09:32:23.507

11

A way would be to just generate a unsigned long long for every 64 random calls as stated in the comments. An example:

#include <random>
class Randomizer
{
public:
    Randomizer() : m_rand(0), counter(0), randomizer(0, std::numeric_limits<unsigned long long>::max()) {}

    bool RandomBool()
    {
        if (!counter)
        {
            m_rand = randomizer(std::mt19937(rd()));
            counter = sizeof(unsigned long long) * 8;

        }
        return (m_rand >> --counter) & 1;
    }
private:
    std::random_device  rd;
    std::uniform_int_distribution<unsigned long long> randomizer;
    unsigned long long m_rand;
    int counter;
};

edited Feb 12 '16 at 09:32

answered Feb 12 '16 at 09:21

Hatted Rooster

35,759
6
62
122

Did you measure it against OP's example? – juanchopanza Feb 12 '16 at 09:22
1

You should not be creating a new `random_device` and `uniform_int_distribution` over and over again in a time critical loop. – nwp Feb 12 '16 at 09:22
1

I'd wager constructing `mt19937` is orders of magnitude more overhead than `random_device` and `uniform_int_distribution` combined -- put that at class scope too. – ildjarn Feb 12 '16 at 09:55
2

If you didn't create a new `std::mt19937` on each `RandomBool` call, this would be slightly faster than Xorshift128+, at least in my benchmarks. – Emil Laine Feb 12 '16 at 10:06

score 4 · Answer 4 · edited Feb 13 '16 at 16:08

I would prefill a (long enough) (circular) buffer of 64bit random values, and then take very quickly one bit at a time when in need of a boolean random value

#include <stdint.h>

class BoolGenerator {
  private:
  const int BUFFER_SIZE = 65536;
  uint64_t randomBuffer[BUFFER_SIZE];
  uint64_t mask;
  int counter;

  void advanceCounter {
    counter++;
    if (counter == BUFFER_SIZE) {
        counter = 0;
    }
  }

  public:
  BoolGenerator() {
    //HERE FILL YOUR BUFFER WITH A RANDOM GENERATOR
    mask = 1;
    counter = 0;
  }

  bool generate() {
    mask <<= 1;
    if (!mask) { //After 64 shifts the mask becomes zero
        mask = 1;//reset mask
        advanceCounter();//get the next value in the buffer
    }
    return randomBuffer[counter] & mask;
  }
}

Of course the class can be made general to the buffer size, the random generator, the base type (doesn't necessarily have to be uint64_t) etc.

Accessing the buffer only once every 64 calls:

#include <stdint.h> //...and much more

class BoolGenerator {
  private:
  static const int BUFFER_SIZE = 65536;
  uint64_t randomBuffer[BUFFER_SIZE];
  uint64_t currValue;
  int bufferCounter;
  int bitCounter;

  void advanceBufferCounter() {
    bufferCounter++;
    if (bufferCounter == BUFFER_SIZE) {
        bufferCounter = 0;
    }
  }

  void getNextValue() {
      currValue = randomBuffer[bufferCounter];
      bitCounter = sizeof(uint64_t) * 8;
      advanceBufferCounter();
  }

  //HERE FILL YOUR BUFFER WITH A RANDOM GENERATOR
  void initializeBuffer() {
  //Anything will do, taken from here: http://stackoverflow.com/a/19728404/2436175
      std::random_device rd;
      std::mt19937 rng(rd());
      std::uniform_int_distribution<uint64_t> uni(0,std::numeric_limits<uint64_t>::max());
      for (int i = 0; i < BUFFER_SIZE; i++ ) {
          randomBuffer[i] = uni(rng);
      }
  }

  public:
  BoolGenerator() {
      initializeBuffer();
      bufferCounter = 0;
      getNextValue();
  }

  bool generate() {
      if (!bitCounter) {
           getNextValue();
      }
      //A variation of other methods seen around
      bitCounter--;
      bool retVal = currValue & 0x01;
      currValue >>= 1;
      return retVal;
  }
};

You should probably refill the buffer when the counter == BUFFER_SIZE, to get a repeat-length in the order of random number generators... — Falco, Feb 12 '16 at 12:32
@Falco The time critical part (random boolean generation) would become occasionally EXTREMELY slow, which wouldn't be acceptable. The idea is that, depending on the randomness needs, one can accept a repetition of the sequence after a long enough period. — Antonio, Feb 12 '16 at 12:37
For a library class I would rather throw an error or explicitly state the short cycle length, because otherwise this is really hard to find. Precompution sounds good, but short cycle length will probably be a big problem. — Falco, Feb 12 '16 at 12:41
@Falco What you say is correct. One should probably know in advance how many random numbers is going to need, or which repeatibility can accept, and the class should have a name clearly stating these limitations. — Antonio, Feb 12 '16 at 12:55
@Falco Anyway, the buffer idea seems applicable to this specific problem because what we are storing are just *bits*, and memory occupation is somehow reduced. Again, it depends on how much random booleans we need to generate. — Antonio, Feb 12 '16 at 13:01
The second generator is probably not what most people want because it repeatedly returns the same 65536 * 64 = 4,194,304 booleans. If you reinitialized the buffer when `bufferCounter == BUFFER_SIZE` then this is about as fast as Xorshift128+ alone. — Emil Laine, Feb 13 '16 at 16:27
@zenith It might match some needs, especially if it's known in advance how many random booleans are needed. As I said, the idea of precomputing values here is tempting because they can be stored somehow "compressed", 8 values per byte. — Antonio, Feb 13 '16 at 23:06

score 3 · Answer 5 · answered Feb 12 '16 at 18:23

3

Unless you have further constraints on the randomness you need, the fastest way to generate a random bool is:

bool RandomBool() { return false; }

To be more specific, there are thousands of ways to generate random boolean numbers, all satisfying different constraints, and many of them do not deliver "truly" random numbers (that includes all the other answers so far). The word "random" alone does not tell anyone what properties you really need.

answered Feb 12 '16 at 18:23

Peter

5,608
1
24
43

A pseudorandom generator does not deliver truly random numbers either, there is no way to do that. – Hatted Rooster Feb 13 '16 at 09:25
@GillBates There are indeed many ways to generate numbers that are random and less predictable than those produced by a pseudo-random generator, which I assume is what you mean by "truly random". They either use specialized hardware, or use side effects of other hardware. It remains a fact that "random" is a very poorly defined term. – Peter Feb 17 '16 at 02:03

johnwbyrd · Answer 6 · 2016-02-17T02:41:11.307

If performance is your only criterion, then the answer is:

bool get_random()
{
    return true; // chosen by fair coin flip.
                 // guaranteed to be random.
}

Unfortunately, the entropy of this random number is zero, but the performance is quite fast.

Since I suspect that this random number generator is not very useful to you, you will need to quantify how random you want your booleans to be. How about a cycle length of 2048? One million? 2^19937-1? Until the end of the universe?

I suspect that, since you explicitly stated that performance is your utmost concern, then a good old fashioned linear congruential generator might be "good enough". Based on this article, I'm guessing that this generator's period is around 32*((2^31)-5), or about 68 trillion iterations. If that's not "good enough", you can drop in any C++11 compatible generator you like instead of minstd_rand.

For extra credit, and a small performance hit, modify the below code to use the biased coin algorithm to remove bias in the generator.

#include <iostream>
#include <random>

bool get_random()
{
    typedef std::minstd_rand generator_type;
    typedef generator_type::result_type result_type;

    static generator_type generator;
    static unsigned int bits_remaining = 0;
    static result_type random_bits;

    if ( bits_remaining == 0 )
    {
        random_bits = generator();
        bits_remaining = sizeof( result_type ) * CHAR_BIT - 1;
    }

    return ( ( random_bits & ( 1 << bits_remaining-- ) ) != 0 );
}

int main()
{
    for ( unsigned int i = 0; i < 1000; i++ )
    {
        std::cout << " Choice " << i << ": ";
        if ( get_random() )
            std::cout << "true";
        else
            std::cout << "false";

        std::cout << std::endl;
    }
}

ErmIg · Answer 7 · 2016-02-12T09:29:48.933

0

iI think that best way is an using of precalculated random array:

uint8_t g_rand[UINT16_MAX];
bool InitRand()
{
    for (size_t i = 0, n = UINT16_MAX; i < n; ++i)
        g_rand[i] = ::rand() & 1;
    return true;
}
bool g_inited = InitRand();
inline const uint8_t * Rand()
{
    return g_rand + (::rand()&INT16_MAX);
}

It using to fill some array dst[size]:

const size_t size = 10000;
bool dst[size];
for (size_t i = 0; i < size; i += INT16_MAX)
     memcpy(dst + i, Rand(), std::min<size_t>(INT16_MAX, size - col));

Of course you can initialize pre-calculated array with using of another random function.

edited Feb 12 '16 at 09:29

answered Feb 12 '16 at 09:19

ErmIg

3,980
1
27
40

Did you measure it against OP's example? – juanchopanza Feb 12 '16 at 09:22
I use this method for initialization of big array in Rhasberry Pi by random numbers. It works much faster than calling rand(). – ErmIg Feb 12 '16 at 09:27
2

Be careful - the least significant bits of `rand()` are usually the least random ones. – Angew is no longer proud of SO Feb 12 '16 at 09:35

score 0 · Answer 8 · answered Feb 12 '16 at 09:23

if performance is important, perhaps it's a good idea to generate a 32 bit random number and use each separate bit of it, something like this:

bool getRandBool() {
    static uint32_t randomnumber;
    static int i=0;
    if (i==0) {
        randomnumber = <whatever your favorite randonnumbergenerator is>;
        i=32;
    }
    return (randomnumber & 1<<--i); 
 }

this way the generation only impacts every 32th call

score 0 · Answer 9 · edited May 23 '17 at 12:33

Apparently I have to add another answer. Just figured out that starting with Ivy Bridge architecture Intel added RdRand CPU instruction and AMD added it later in June 2015. So if you are targeting a processor that is new enough and don't mind using (inline) assembly, the fastest way to generate random bools should be in calling RdRand CPU instruction to get a 64-bit random number as described here (scroll to approximately the middle of the page for code examples) (at that link there is also a code example for checking the current CPU for support of RdRand instruction, and see also the Wikipedia for an explanation of how to do this with CPUID instruction), and then use the bits of that number for booleans as described in my Xorshit+ based answer.

What is performance-wise the best way to generate random bools?

9 Answers9

Linked