6

I have been using random_device rd{} to generate seeds for my Mersenne-Twister pseudo random number generator mt19937 RNG{rd()} as have been suggested here. However, it is written in the documentation (comment in the documentations' example code), that "the performance of many implementations of random_device degrades sharply once the entropy pool is exhausted. For practical use random_device is generally only used to seed a PRNG such as mt19937". I have tried testing how big this "entropy pool" is, and for 10^6 number of calls, random_device returns me more than 10^2 repeating numbers (see my example code and output below). In other words, if I will try using random_device as a seed to my Mersenne-Twister PRNG, it will generate a solid fraction of repeating seeds.

Question: do people still use random_device in C++ to generate seeds for PRNG or are there already better alternatives?

My code:

#include <iostream>
#include <random>
#include <chrono>

using namespace std;

int main(){
    
    auto begin = std::chrono::high_resolution_clock::now();
    
    random_device rd{};
    mt19937 RNG{ rd() };

    int total_n_of_calls = 1e6;
    vector<int> seeds;
    
    for(auto call = 0; call < total_n_of_calls; call++){
    int call_rd = rd();
    seeds.push_back(call_rd);
    }
    
    int count_repeats = 0;
    sort(seeds.begin(), seeds.end());
    for(int i = 0; i < seeds.size() - 1; i++) {
        if (seeds[i] == seeds[i + 1]) {
            count_repeats++;
    }
    }
    
    printf("Number of times random_device have been called: %i\n", total_n_of_calls);
    printf("Number of repeats: %i\n", count_repeats);

    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    printf("Duration: %.3f seconds.\n", elapsed.count() * 1e-9);

    return 0;
}

The output:

Number of times random_device have been called: 1000000
Number of repeats: 111
Duration: 0.594 seconds.

  • 1
    you left out parts of the quote. Also the quote is "only" from the code example. The full quote says "... performance of many implementations of random_device ...". just saying, it doesn't do much to the quesiton, but I had difficulties to find the quote at first and then I was puzzled because you quote something else – 463035818_is_not_an_ai Jan 23 '23 at 10:22
  • Do you need cryptographic strength? If not, just use a common PRNG like http://www.jstatsoft.org/v08/i14/paper – Something Something Jan 23 '23 at 10:22
  • 1
    well, actually it does something to the question, because you also left out the last part "...For practical use random_device is generally only used to seed a PRNG such as mt19937". So the full quote is only to remind you to not use this as the prng, but only to seed the prng – 463035818_is_not_an_ai Jan 23 '23 at 10:23
  • imho this could be a good question, but unfortunately as it is currently phrased your only evidence for a claim is a misquoted sentence. Nowhere else in the documentation it is suggested to not use it – 463035818_is_not_an_ai Jan 23 '23 at 10:26
  • 1
    @463035818_is_not_a_number Thank you very much for noting this, and for your suggestions! I have tried amending my question. Hopefully, it appears better now. – Ruslan Mukhamadiarov Jan 23 '23 at 10:30
  • @RuslanMukhamadiarov So I guess the question now is why do you feel the need to seed your RNG more than once? Normally you seed once and then generate multiple pseudo random numbers. – john Jan 23 '23 at 10:39
  • @RuslanMukhamadiarov Also your test code is testing the performance of `random_device` not a Mersenne twister RNG seeded with `random_device`. I'm having trouble understanding what your expectations are. – john Jan 23 '23 at 10:45
  • @john I run computer simulations, and I am regularly submitting simulation batches of size ~10^6 on separate threads in computer cluster. I worry that the seeds generated by random_device may repeat for some of those independent simulation runs. If this is the case, I will have some of my data files sharing the same output. From @Marcus Müller answer, I understood that I am not adequately testing what I want to test, since I run `random_device` repeatedly on a single machine. – Ruslan Mukhamadiarov Jan 23 '23 at 11:02
  • 1
    Random number sequences have repetitions. If you toss an unbiased coin four time you're just as likely to get HHHH as you are to get HTHT. There's nothing wrong with `random_device`. If you need a non-repeating sequence you have to do the work to create one. – Pete Becker Jan 23 '23 at 15:21

1 Answers1

7

TL;DR: No, there's nothing better. You just need to stop abusing it.


The point of random_device is that it asks your platform for bits that are actually random, not just pseudorandom from some deterministic seed.

If the platform / OS thinks the entropy it had was expended, then it cannot offer you this. But honestly, it uses true sources of randomness, from actual randomness hardware in your CPU to timing of disk access, to modify the internal state of a PRNG. That's all there is to it – to someone external, the bits you get are still unpredictable.

So, the answer is this:

  • you use random_device because you need actually random seeds. There's no algorithmic shortcut to randomness – the word "algorithm" already says that it's deterministic. And software, universally, is deterministic, unless it gets random data externally. So, all you can do is ask the operating system, which actually deals with any source of randomness there is in your system. And that's already exactly what random_device does.
  • So, no, you cannot use something else but actual external entropy, which is exactly what you get most efficiently from random_device (unless you buy an expensive dedicated random generator card and write a driver for it).
  • As the OS uses the random external source to change the internal state of a PRNG, it can produce more random things securely than random events happen – but it needs to keep track of how much bits got taken out of the PRNG, so that it never becomes possible for an attacker to reconstruct prior state with a high probability of being right. Thus, it slows down your consumption of randomness when there's not enough external randomness to modify the internal state.
  • Thus, 10⁶ calls to generate a seed in a short time sound like you're doing something wrong; twice as much if these are used to feed a Mersenne twister, an algorithm that is overly complex and slow, but not cryptographically secure. You're not using this much actual randomness, ever! Don't reseed, continue to use your seeded PRNG, unless you need cryptographically safety that these seeds are independent.

And that's exactly the thing: if you're in a situation where you need to generate 10⁶ independent cryptographically secure keys in less than a few seconds, you're a bit special. Are you working for someone who does CDNs, where a single operating system would serve millions of new TLS connections per second? If not, reduce your usage of random_device to what it's actually useful for.

If you want to understand more about the way true randomness ends up in your program, I recommend reading this answer. In short, if you're actually in need of more random bytes per second than the default random_device offers, try constructing it with "/dev/urandom" as a ctor parameter. It's still going to be secure, for any assumable definition of what that means in the context in which you're asking this (which means I assume you're not writing a cryptographic library for extremely high throughput of key generation).

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94
  • "You just need to stop abusing it." just nails it :). Docuementation linked by OP says things like "In this case each std::random_device object may generate the same number sequence.", that doesnt imply that one should not use it, one would also not use a hammer to drive a screw in a wall – 463035818_is_not_an_ai Jan 23 '23 at 10:47
  • Thank you very much for your thorough answer! After reading your explanation, I do understand that the example code I have provided to test the "entropy pool" doesn't really its job. I am a physicist and I run simulations on computer clusters. I was worried that submitting 10^6 jobs on a cluster with `random_device` may result with me having too many repeating output data files. I am not sure how I can adequately test this. – Ruslan Mukhamadiarov Jan 23 '23 at 10:53
  • @RuslanMukhamadiarov [xy problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Why you want to do something is always important information, sometimes it is essential to help with your actual problem. I suppose you do not want to run the simulation twice with the same seed, because that would waste cpu time – 463035818_is_not_an_ai Jan 23 '23 at 10:59
  • I hate the thing anyway. I even wrote a little library to deal with it. https://github.com/Duthomhas/CSPRNG – Dúthomhas Jan 23 '23 at 11:02
  • @RuslanMukhamadiarov write a small program that uses a PRNG seeded once by `random_device` and outputs to a file 10^6 random values. Use one of these saved values to seed the PRNG of each of the 10^6 jobs. – Richard Critten Jan 23 '23 at 11:03
  • @463035818_is_not_a_number Thank you. You are absolutely right, I was just not sure if this is the right platform for me to talk about running computer simulations. – Ruslan Mukhamadiarov Jan 23 '23 at 11:06
  • 1
    @RuslanMukhamadiarov For scientific simulations you do _not_ want to use `std::random_device` at all. You want the simulation to be reproducible, so seed it with a deterministically chosen seed. – user17732522 Jan 23 '23 at 11:09
  • @RuslanMukhamadiarov not a problem; your cluster doesn't have just 1 `random_device`, nor would it matter if the same job (whcih will takes minutes to weeks to run) starts within 100 ms on all machines in unisono. Again, if in your simulation you constantly reseed, you're doing something wrong, and should be using your PRNGs instead of reseeding them all the time. And, as the commenter above said: A simulation should be reproducible, so it sounds like an actively *bad* idea to use true randomness. – Marcus Müller Jan 23 '23 at 12:02