139

I seem to see many answers in which someone suggests using <random> to generate random numbers, usually along with code like this:

std::random_device rd;  
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, 5);
dis(gen);

Usually this replaces some kind of "unholy abomination" such as:

srand(time(NULL));
rand()%6;

We might criticize the old way by arguing that time(NULL) provides low entropy, time(NULL) is predictable, and the end result is non-uniform.

But all of that is true of the new way: it just has a shinier veneer.

  • rd() returns a single unsigned int. This has at least 16 bits and probably 32. That's not enough to seed MT's 19937 bits of state.

  • Using std::mt19937 gen(rd());gen() (seeding with 32 bits and looking at the first output) doesn't give a good output distribution. 7 and 13 can never be the first output. Two seeds produce 0. Twelve seeds produce 1226181350. (Link)

  • std::random_device can be, and sometimes is, implemented as a simple PRNG with a fixed seed. It might therefore produce the same sequence on every run. (Link) This is even worse than time(NULL).

Worse yet, it is very easy to copy and paste the foregoing code snippets, despite the problems they contain. Some solutions to the this require acquiring largish libraries which may not be suitable to everyone.

In light of this, my question is How can one succinctly, portably, and thoroughly seed the mt19937 PRNG in C++?

Given the issues above, a good answer:

  • Must fully seed the mt19937/mt19937_64.
  • Cannot rely solely on std::random_device or time(NULL) as a source of entropy.
  • Should not rely on Boost or other libaries.
  • Should fit in a small number of lines such that it would look nice copy-pasted into an answer.

Thoughts

  • My current thought is that outputs from std::random_device can be mashed up (perhaps via XOR) with time(NULL), values derived from address space randomization, and a hard-coded constant (which could be set during distribution) to get a best-effort shot at entropy.

  • std::random_device::entropy() does not give a good indication of what std::random_device might or might not do.

Rakete1111
  • 47,013
  • 16
  • 123
  • 162
Richard
  • 56,349
  • 34
  • 180
  • 251
  • 1
    I don't know what the answer is but https://stackoverflow.com/a/22951601/560648 suggests a third-party library is currently your best bet. https://stackoverflow.com/a/26961833/560648 agrees with your concern. My first reaction is that `std::mt19937` is a nice portable framework that allows you to plug in external, non-standard sources of entropy. Note that `rand()` did not allow you to do that; perhaps that's the benefit? https://stackoverflow.com/a/34490647/560648 agrees that it's down to your implementation, really, which frankly makes sense. – Lightness Races in Orbit Jul 12 '17 at 23:50
  • 6
    My personal thought was that perhaps values could be drawn from `std::random_device`, `time(NULL)`, and function addresses, then XORed together to produce a kind of best-effort entropy source. – Richard Jul 12 '17 at 23:53
  • 5
    It would be nice if there was function such as does_random_device_actually_work() so one could at least degrade gracefully, or produce warnings or errors for the user. –  Jul 13 '17 at 00:00
  • @NeilButterworth: True that. And the `std::random_device::entropy()` function would seem well-suited to that purpose. But **[no](https://stackoverflow.com/questions/28390843/how-to-find-the-true-entropy-of-stdrandom-device)**. – Richard Jul 13 '17 at 00:02
  • 4
    The proper solution is not short, the short solution won't be proper. My approach I use in my [seed11 library](https://github.com/milleniumbug/seed11) is basically to implement `std::random_device` properly on the platforms you're planning to run your program on, and provide a helper function that creates a seeded generator (`seed11::make_seeded()`) – milleniumbug Jul 13 '17 at 00:05
  • 1
    @Richard I suppose you could call on the device a 100 (say) times and see if you always get the same value back? –  Jul 13 '17 at 00:12
  • 2
    @NeilButterworth: This doesn't identify if a statically-seeded PRNG is being used, as is the case in MinGW. – Richard Jul 13 '17 at 00:14
  • @Richard Even if you re-create the device each time? –  Jul 13 '17 at 00:18
  • @NeilButterworth: I see your point. Indeed, this may detect the issue, but I'm looking for a robust response to it. – Richard Jul 13 '17 at 00:25
  • 2
    [`random_utils.hpp`](https://gist.github.com/imneme/540829265469e673d045) is linked from another pcg-random [blog post](http://www.pcg-random.org/posts/simple-portable-cpp-seed-entropy.html) on the subject and does much of what you mention in your "thoughts" section. – Miles Budnek Jul 13 '17 at 00:51
  • Are there any portable ways to turn the address of a function into an integer? Cast to `void*` then `INT_PTR`? – Yakk - Adam Nevraumont Jul 13 '17 at 01:54
  • Does [mt19937::seed](http://en.cppreference.com/w/cpp/numeric/random/mersenne_twister_engine/seed) not do what you want? – Gambit Jul 13 '17 at 03:38
  • 2
    @Gambit: No. it's equivalent to passing a value to the constructor. The difficulty is in finding a good unpredictable value to pass. – Ben Voigt Jul 13 '17 at 03:44
  • 5
    Aside: your second bullet doesn't add anything new. It is not surprising that you found some value that appears 12 times. [You should expect there to be just over three values that appear exactly 12 times](http://www.wolframalpha.com/input/?i=2%5E32+*+binomial(2%5E32,+12)+*+(1%2F2%5E32)%5E12+*+(1+-+1%2F2%5E32)%5E(2%5E32+-+12)), assuming that you have 2^32 *independent, uniformly random* samples. –  Jul 13 '17 at 07:02
  • 1
    @Hurkyl: True, but that's not really the point :) If you run a program using this scheme lots of times, and take the first number it outputs, you want to get an (at least approximately) uniform distribution, whereas in fact the output will in fact be skewed in a way that will make a noticeable difference to many applications. If you use a seed with the full possible entropy, those differences will be smoothed out much more. – psmears Jul 13 '17 at 09:21
  • [This later blog post in the series you linked](http://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html) should give you lots of good information. – etarion Jul 13 '17 at 09:21
  • If you need cryptographic randomness, you shouldn't be using `std::mt19937`. It leaks internal state much too fast to be used for cryptographic purposes. – Mark Jul 14 '17 at 01:20
  • 1
    "unholy abomination" +1 – mackycheese21 Aug 08 '18 at 14:39

9 Answers9

63

I would argue the greatest flaw with std::random_device is the that it is allowed a deterministic fallback if no CSPRNG is available. This alone is a good reason not to seed a PRNG using std::random_device, since the bytes produced may be deterministic. It unfortunately doesn't provide an API to find out when this happens, or to request failure instead of low-quality random numbers.

That is, there is no completely portable solution: however, there is a decent, minimal approach. You can use a minimal wrapper around a CSPRNG (defined as sysrandom below) to seed the PRNG.

Windows


You can rely on CryptGenRandom, a CSPRNG. For example, you may use the following code:

bool acquire_context(HCRYPTPROV *ctx)
{
    if (!CryptAcquireContext(ctx, nullptr, nullptr, PROV_RSA_FULL, 0)) {
        return CryptAcquireContext(ctx, nullptr, nullptr, PROV_RSA_FULL, CRYPT_NEWKEYSET);
    }
    return true;
}


size_t sysrandom(void* dst, size_t dstlen)
{
    HCRYPTPROV ctx;
    if (!acquire_context(&ctx)) {
        throw std::runtime_error("Unable to initialize Win32 crypt library.");
    }

    BYTE* buffer = reinterpret_cast<BYTE*>(dst);
    if(!CryptGenRandom(ctx, dstlen, buffer)) {
        throw std::runtime_error("Unable to generate random bytes.");
    }

    if (!CryptReleaseContext(ctx, 0)) {
        throw std::runtime_error("Unable to release Win32 crypt library.");
    }

    return dstlen;
}

Unix-Like


On many Unix-like systems, you should use /dev/urandom when possible (although this is not guaranteed to exist on POSIX-compliant systems).

size_t sysrandom(void* dst, size_t dstlen)
{
    char* buffer = reinterpret_cast<char*>(dst);
    std::ifstream stream("/dev/urandom", std::ios_base::binary | std::ios_base::in);
    stream.read(buffer, dstlen);

    return dstlen;
}

Other


If no CSPRNG is available, you might choose to rely on std::random_device. However, I would avoid this if possible, since various compilers (most notably, MinGW) implement it with as a PRNG (in fact, producing the same sequence every time to alert humans that it's not properly random).

Seeding


Now that we have our pieces with minimal overhead, we can generate the desired bits of random entropy to seed our PRNG. The example uses (an obviously insufficient) 32-bits to seed the PRNG, and you should increase this value (which is dependent on your CSPRNG).

std::uint_least32_t seed;    
sysrandom(&seed, sizeof(seed));
std::mt19937 gen(seed);

Comparison To Boost


We can see parallels to boost::random_device (a true CSPRNG) after a quick look at the source code. Boost uses MS_DEF_PROV on Windows, which is the provider type for PROV_RSA_FULL. The only thing missing would be verifying the cryptographic context, which can be done with CRYPT_VERIFYCONTEXT. On *Nix, Boost uses /dev/urandom. IE, this solution is portable, well-tested, and easy-to-use.

Linux Specialization


If you're willing to sacrifice succinctness for security, getrandom is an excellent choice on Linux 3.17 and above, and on recent Solaris. getrandom behaves identically to /dev/urandom, except it blocks if the kernel hasn't initialized its CSPRNG yet after booting. The following snippet detects if Linux getrandom is available, and if not falls back to /dev/urandom.

#if defined(__linux__) || defined(linux) || defined(__linux)
#   // Check the kernel version. `getrandom` is only Linux 3.17 and above.
#   include <linux/version.h>
#   if LINUX_VERSION_CODE >= KERNEL_VERSION(3,17,0)
#       define HAVE_GETRANDOM
#   endif
#endif

// also requires glibc 2.25 for the libc wrapper
#if defined(HAVE_GETRANDOM)
#   include <sys/syscall.h>
#   include <linux/random.h>

size_t sysrandom(void* dst, size_t dstlen)
{
    int bytes = syscall(SYS_getrandom, dst, dstlen, 0);
    if (bytes != dstlen) {
        throw std::runtime_error("Unable to read N bytes from CSPRNG.");
    }

    return dstlen;
}

#elif defined(_WIN32)

// Windows sysrandom here.

#else

// POSIX sysrandom here.

#endif

OpenBSD


There is one final caveat: modern OpenBSD does not have /dev/urandom. You should use getentropy instead.

#if defined(__OpenBSD__)
#   define HAVE_GETENTROPY
#endif

#if defined(HAVE_GETENTROPY)
#   include <unistd.h>

size_t sysrandom(void* dst, size_t dstlen)
{
    int bytes = getentropy(dst, dstlen);
    if (bytes != dstlen) {
        throw std::runtime_error("Unable to read N bytes from CSPRNG.");
    }

    return dstlen;
}

#endif

Other Thoughts


If you need cryptographically secure random bytes, you should probably replace the fstream with POSIX's unbuffered open/read/close. This is because both basic_filebuf and FILE contain an internal buffer, which will be allocated via a standard allocator (and therefore not wiped from memory).

This could easily be done by changing sysrandom to:

size_t sysrandom(void* dst, size_t dstlen)
{
    int fd = open("/dev/urandom", O_RDONLY);
    if (fd == -1) {
        throw std::runtime_error("Unable to open /dev/urandom.");
    }
    if (read(fd, dst, dstlen) != dstlen) {
        close(fd);
        throw std::runtime_error("Unable to read N bytes from CSPRNG.");
    }

    close(fd);
    return dstlen;
}

Thanks


Special thanks to Ben Voigt for pointing out FILE uses buffered reads, and therefore should not be used.

I would also like to thank Peter Cordes for mentioning getrandom, and OpenBSD's lack of /dev/urandom.

Alex Huszagh
  • 13,272
  • 3
  • 39
  • 67
  • 13
    This is what I've done in the past, but the, or at least a, question is WTF can't the library writers for these platforms do this for us? I expect file access and threads (for example) to be abstracted by library implementations, so why not random number generation? –  Jul 13 '17 at 00:13
  • @etarion, you can simply use a larger seed then. The `sysrandom` logic intentionally avoids any requirements on the input size. You can easily up the logic to the pool size of /dev/urandom (Linux 4.8 and below) to 512 bytes. This answer simply uses a simple example to show how to seed the generator, in most examples, fully seeding an mt19337 PRNG will not be practical (although you can reasonable do much more than what I've done above). – Alex Huszagh Jul 13 '17 at 08:32
  • 2
    OP here: It would be nice if this answer demonstrated the seeding a bit better. As much as possible, I am hoping for answers which generate copy-pastable code that does the job better than the simple example I posted in my question without requiring much technical interpretation or thought on the part of the coder. – Richard Jul 13 '17 at 19:43
  • 4
    I thought `/dev/random` would be the better choice for seeding an RNG, but apparently [`/dev/urandom` is still considered computationally secure](https://www.2uo.de/myths-about-urandom/) even when `/dev/random` would block because of low available entropy, so `urandom` is the recommended choice for everything except maybe one-time pads. See also https://unix.stackexchange.com/questions/324209/when-to-use-dev-random-vs-dev-urandom. Beware of predictable seeds from `urandom` very early after bootup, though. – Peter Cordes Jul 13 '17 at 20:36
  • 3
    Linux's [`getrandom(2)`](http://man7.org/linux/man-pages/man2/getrandom.2.html) system call is like opening and reading `/dev/urandom`, except it will block if the kernel's randomness sources haven't been initialized yet. I think this saves you from the early-boot low-quality-randomness problem without blocking in other cases like `/dev/random` does. – Peter Cordes Jul 13 '17 at 20:54
  • 1
    @PeterCordes, for sure, and that's a great option when available. However, it does not work on BSD or other *Nixes, which is something `/dev/urandom` generally works on. The Python mailing list discussion on this is something I generally subscribe to: https://bugs.python.org/issue27266 – Alex Huszagh Jul 13 '17 at 21:12
  • 1
    Suggestion: whitelist implementations where `std::random_device` is known to not suck, and use it instead of actually coding up a read() from `/dev/urandom`? Then you don't have to worry about maintaining the platform support code if some OS decides there's an even better way to get randomness. It's unlikely that an implementation will regress from usable to unusable. Or if you don't use it for a security-sensitive purpose, it might make sense to take the riskier approach of only `random_device` on a few known-bad platforms like MinGW. – Peter Cordes Jul 15 '17 at 05:47
  • 2
    Also, for any x86 target, you can check `CPUID` and run [`RDSEED` (`int _rdseed64_step( unsigned __int64 *)`](http://felixcloutier.com/x86/RDSEED.html) if CPUID says it's available. ([`RDRAND`](http://felixcloutier.com/x86/RDRAND.html) apparently doesn't have as high a randomness-quality guarantee, hence the existence of `RDSEED` which is compliant to NIST SP800-90B and NIST SP800-90C.) But note that you still need a fallback because the API allows it to report an error (e.g. if it's physically damaged in a given CPU). IDK if real hardware ever does that. – Peter Cordes Jul 15 '17 at 05:54
23

In a sense, this can't be done portably. That is, one can conceive a valid fully-deterministic platform running C++ (say, a simulator which steps the machine clock deterministically, and with "determinized" I/O) in which there is no source of randomness to seed a PRNG.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • If interaction with the user is part of the platform, then it cannot be deterministic. That's not much practical help, though. – Ask About Monica Jul 12 '17 at 23:56
  • 1
    @kbelder: 1. Who says the user is a person? 2. Not all programs have user interaction and you certainly can't assume there's always a user around... – einpoklum Jul 13 '17 at 00:07
  • 9
    I appreciate this response, but also feel as though a program should make a reasonable best-effort attempt. – Richard Jul 13 '17 at 00:12
  • 3
    @Richard Agreed, but the issue is the C++ standard writers have to (or at least try their darndest to) accommodate these sorts of bizarre situations. That's why you get these sorts of wishy-washy standard definitions, where you might get decent results, but the compiler can still be standards-compliant even if it gives back something that's functionally worthless. -- So your restrictions ("short and cannot rely on other libraries") rule out any response, as you effectively need a platform-by-platform/compiler-by-compiler special casing. (e.g. what Boost does so well.) – R.M. Jul 13 '17 at 15:13
  • 2
    @Richard what it explains, though, is that you get what you get in the standard *because* there is no portable way to do better. If you want to do better (which is a noble goal) you will have to accept some greater or lesser amount of abomination :) – hobbs Jul 13 '17 at 19:08
  • @hobbs: I am all about balanced abomination. :-) – Richard Jul 13 '17 at 19:13
  • 1
    @Richard: Sometimes you just have to accept that it's possible to make a standards-compliant C++ implementation which isn't useful. Since the implementations people use for anything that matters *are* designed to be useful, you sometimes have to live with arguments like "any sane implementation will do something reasonable". I would have hoped that `std::random_device` would be in that category, but apparently it's not if some real implementations use a fixed-seed PRNG! That goes way beyond einpoklum's argument. – Peter Cordes Jul 13 '17 at 21:05
  • 1
    @PeterCordes: Actually, I would argue that there probably are real platforms (with C++ compilers targeting them?) in which you just don't have effective access to a randomness source. For sure some embedded devices could be like that. – einpoklum Jul 13 '17 at 22:26
  • 1
    @einpoklum: A best-effort attempt to gather some randomness would maybe still be better than a fixed seed, though. e.g. sample the low bits of a high-precision clock a few times, mix in a PID and maybe some other things that vary between runs. Some argue that producing identical sequences every time is the best way to indicate that there isn't a good quality RNG available, but IMO that's a sign of an over-simplified API. If there was an optional `high_quality_only=false` arg, you could run `rd(true)` to request failure if crypto-quality randomness wasn't available. – Peter Cordes Jul 14 '17 at 00:32
  • 1
    But code that wanted good-enough randomness for non-security use cases could get what they want from `rd()`. And of course it's just a massive quality-of-implementation issue that MinGW is using a fixed seed on an OS that has a randomness source. It sucks a lot that we can't trust `std::random_device` to give something close to as good as the platform can provide. – Peter Cordes Jul 14 '17 at 00:35
  • @PeterCordes: The point is that on some systems you probably can't sample the clock with high precision, and there are no processes. – einpoklum Jul 14 '17 at 08:39
  • 1
    Yes, I understand there will be some platforms where there's just so little randomness that you can't really do anything. But as you say, usually only embedded, in which case any software that runs on it will (should) be tested carefully for that platform. Not compiled from source by someone who just downloaded it, like will happen on MinGW, which is ruining `std::random_device` for everyone by providing an *unexpectedly* terrible implementation. – Peter Cordes Jul 14 '17 at 13:36
  • @PeterCordes: Fair enough. – einpoklum Jul 14 '17 at 15:44
15

You can use a std::seed_seq and fill it to at least the requires state size for the generator using Alexander Huszagh's method of getting the entropy:

size_t sysrandom(void* dst, size_t dstlen); //from Alexander Huszagh answer above

void foo(){

    std::array<std::mt19937::UIntType, std::mt19937::state_size> state;
    sysrandom(state.begin(), state.length*sizeof(std::mt19937::UIntType));
    std::seed_seq s(state.begin(), state.end());

    std::mt19937 g;
    g.seed(s);
}

If there was a proper way to fill or create a SeedSequence from a UniformRandomBitGenerator in the standard library using std::random_device for seeding properly would be much simpler.

ratchet freak
  • 47,288
  • 5
  • 68
  • 106
  • 1
    seed_seq has issues though, http://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html – etarion Jul 13 '17 at 11:28
  • There is nothing in the C++ standard or anywere to guarantee that the random number generator will use the whole array when you seed from seed_seq. This method will lead to failure if you are using the rng for a scientific simulation, and obviously also for cryptography. About the only use case for this will be to randomize a videogame, but there it would be an overkill. – Kostas Dec 04 '19 at 09:42
5

The implementation I am working on takes advantage of the state_size property of the mt19937 PRNG to decide how many seeds to provide on initialization:

using Generator = std::mt19937;

inline
auto const& random_data()
{
    thread_local static std::array<typename Generator::result_type, Generator::state_size> data;
    thread_local static std::random_device rd;

    std::generate(std::begin(data), std::end(data), std::ref(rd));

    return data;
}

inline
Generator& random_generator()
{
    auto const& data = random_data();

    thread_local static std::seed_seq seeds(std::begin(data), std::end(data));
    thread_local static Generator gen{seeds};

    return gen;
}

template<typename Number>
Number random_number(Number from, Number to)
{
    using Distribution = typename std::conditional
    <
        std::is_integral<Number>::value,
        std::uniform_int_distribution<Number>,
        std::uniform_real_distribution<Number>
    >::type;

    thread_local static Distribution dist;

    return dist(random_generator(), typename Distribution::param_type{from, to});
}

I think there is room for improvement because std::random_device::result_type could differ from std::mt19937::result_type in size and range so that should really be taken into account.

A note about std::random_device.

According to the C++11(/14/17) standard(s):

26.5.6 Class random_device [ rand.device ]

2 If implementation limitations prevent generating non-deterministic random numbers, the implementation may employ a random number engine.

This means the implementation may only generate deterministic values if it is prevented from generating non-deterministic ones by some limitation.

The MinGW compiler on Windows famously does not provide non-deterministic values from its std::random_device, despite them being easily available from the Operating System. So I consider this a bug and not likely a common occurrence across implementations and platforms.

SJL
  • 403
  • 2
  • 10
Galik
  • 47,303
  • 4
  • 80
  • 117
  • 1
    This may fill the MT state, but still relies solely on `std::random_device`, and is therefore vulnerable to issues stemming from it. – Richard Jul 13 '17 at 00:00
  • @Richard What issues are those? – Galik Jul 13 '17 at 00:01
  • 1
    I think I stated them clearly enough in the question. Happy to clarify/discuss, though. – Richard Jul 13 '17 at 00:03
  • 2
    @Richard Are there any real systems that don't actually implement a reasonable `std::random_device`? I know the standard allows a `PRNG` as fall back but I feel that's just to cover themselves as it is hard to demand that every device that uses `C++` has a non-deterministic random source. And if they don't then what could you do about that anyway? – Galik Jul 13 '17 at 00:09
  • 1
    @Richard So I suspect that `std::random_device` is as good as it gets in terms of *portable* non-deterministic random source. – Galik Jul 13 '17 at 00:10
  • 1
    @Galik, yes, MinGW does not: it's deterministic. The fact that true randomness is not guaranteed should be reason enough to ignore it except as a fallback. – Alex Huszagh Jul 13 '17 at 00:11
  • 2
    In fact, MinGW deliberately chose an identical starting sequence to make the fact that `std::random_device` is a PRNG, and not a CSPRNG, obvious. https://stackoverflow.com/questions/18880654/why-do-i-get-the-same-sequence-for-every-run-with-stdrandom-device-with-mingw – Alex Huszagh Jul 13 '17 at 00:13
  • @AlexanderHuszagh I don't use windows but I was under the impression that the problem with MinGW was a bug, not a feature, but I could be wrong. Is that true of all versions? – Galik Jul 13 '17 at 00:16
  • 1
    @Galik, I am unsure, but I have confirmed it with my own code. I believe it is mostly unimplemented, and the consistency of the results is a feature (to notify an observant user). The fact that `std::random_device` is not guaranteed to be a CSPRNG is reason enough for me to consider it unreliable, and that other options should be used when possible. – Alex Huszagh Jul 13 '17 at 00:19
  • @AlexanderHuszagh I think MinGW is a port so it tends to suffer from incomplete implementation for longer than other compilers perhaps. But I don't think compiler implementers are going to avoid implementing a proper non-deterministic source unless the device simply doesn't have one. That's what the "get out" clause of for. – Galik Jul 13 '17 at 00:24
  • That's dangerous though, since you are then making your portable solution highly dependent on the compiler and compiler version. Remember, this was back in 2013: it hasn't changed since. `std::random_device` may be good if a fallback is needed, but inherently, it's not reliable and should be considered a last resort if known CSPRNGs aren't available. – Alex Huszagh Jul 13 '17 at 00:37
  • 5
    @AlexanderHuszagh I'm not so sure. My intention is to make my "portable solution" dependent on *the device* because if the device supports non-deterministic generators then so should `std::random_device`. I believe that is the spirit of the standard. So I have searched and can only find `MinGW` that is broken in this respect. No one seems to be reporting this problem with anything else that I have found. So, in my library, I have simply marked `MinGW` as unsupported. If there was a wider problem then I'd re-think it. I just don't see the evidence of that right now. – Galik Jul 13 '17 at 01:25
  • This code is wrong (at least) for 64-bit-MTs. `std::array` does not work since seed_seq uses only 32 bit per element (and you only fill 32 bit of it anyway), so for a 64-bit MT you'd need at least twice as many elements. – etarion Jul 13 '17 at 08:33
  • @Galik: If you were to XOR the random device with time, then it would at least generate separate results for each run of the program (to the resolution of the timer). This seems to me an improvement over the current MinGW situation. – Richard Jul 13 '17 at 19:47
  • 6
    I'm really disappointed that MinGW is ruining `std::random_device` for everyone by making it available in a form that doesn't deliver the platform's randomness capabilities. Low-quality implementations defeat the purpose of the API existing. It would be better IMO if they just didn't implement it at all until they have it working. (Or better, if the API provided a way to request failure if high-quality randomness wasn't available, so MinGW could avoid causing security risks while still giving different seeds for games or whatever.) – Peter Cordes Jul 14 '17 at 00:42
1

There's nothing wrong with seeding by using time, assuming you don't need it to be secure (and you didn't say this was necessary). The insight is that you can use hashing to fix the non-randomness. I've found this works adequately in all cases, including and in-particular for heavy Monte Carlo simulations.

One nice feature of this approach is that it generalizes to initialization from other not-really-random sets of seeds. For example, if you want each thread to have its own RNG (for threadsafety), you can just initialize based on hashed thread ID.

The following is a SSCCE, distilled from my codebase (for simplicity; some OO support structures elided):

#include <cstdint> //`uint32_t`
#include <functional> //`std::hash`
#include <random> //`std::mt19937`
#include <iostream> //`std::cout`

static std::mt19937 rng;

static void seed(uint32_t seed) {
    rng.seed(static_cast<std::mt19937::result_type>(seed));
}
static void seed() {
    uint32_t t = static_cast<uint32_t>( time(nullptr) );
    std::hash<uint32_t> hasher; size_t hashed=hasher(t);
    seed( static_cast<uint32_t>(hashed) );
}

int main(int /*argc*/, char* /*argv*/[]) {
    seed();
    std::uniform_int_distribution<> dis(0, 5);
    std::cout << dis(rng);
}
geometrian
  • 14,775
  • 10
  • 56
  • 132
  • 1
    I agree with your point that seeding with the time is probably good enough in practice, if you don't need it to be secure. But I can't agree with the rest of your answer. Seeding with the hash of the time is no better than seeding with the time itself. – D.W. Jul 14 '17 at 13:59
  • @D.W. Empirically, it is **much** better. The reason is that the hash is discontinuous and spans a much wider range of values (try this yourself: seed with `1` and `2` and observe that the sequence of floats generated by them takes a while to really diverge). – geometrian Jul 14 '17 at 18:27
  • I don't see why that matters. We're only running on a single seed at a time. The space of possible values for the seed (the entropy of the seed) is the same either way -- hashing doesn't increase entropy. Perhaps you could edit the question to explain why hashing is better? – D.W. Jul 14 '17 at 18:36
1

There are already a lot of good answers here, but I wanted to add two things:

  • The MinGW bug, here cited as the most notable example of deterministic std::random_device implementation, was fixed in recent versions (link to the bug report).

  • In c++20, there is a way to fill a std::seed_seq with values from std::random_device without using a buffer array:

#include <random>
#include <ranges>

int main(){

    std::random_device rd;
    auto rd_range = std::ranges::transform_view(std::ranges::iota_view(static_cast<std::size_t>(0), std::mt19937::state_size), [&rd](size_t){return rd();});
    std::seed_seq seeds(rd_range.begin(), rd_range.end());
    std::mt19937 gen(seeds);

    return 0;

}
ThePirate42
  • 821
  • 6
  • 22
0

Here's my own stab at the question:

#include <random>
#include <chrono>
#include <cstdint>
#include <algorithm>
#include <functional>
#include <iostream>

uint32_t LilEntropy(){
  //Gather many potential forms of entropy and XOR them
  const  uint32_t my_seed = 1273498732; //Change during distribution
  static uint32_t i = 0;        
  static std::random_device rd; 
  const auto hrclock = std::chrono::high_resolution_clock::now().time_since_epoch().count();
  const auto sclock  = std::chrono::system_clock::now().time_since_epoch().count();
  auto *heap         = malloc(1);
  const auto mash = my_seed + rd() + hrclock + sclock + (i++) +
    reinterpret_cast<intptr_t>(heap)    + reinterpret_cast<intptr_t>(&hrclock) +
    reinterpret_cast<intptr_t>(&i)      + reinterpret_cast<intptr_t>(&malloc)  +
    reinterpret_cast<intptr_t>(&LilEntropy);
  free(heap);
  return mash;
}

//Fully seed the mt19937 engine using as much entropy as we can get our
//hands on
void SeedGenerator(std::mt19937 &mt){
  std::uint_least32_t seed_data[std::mt19937::state_size];
  std::generate_n(seed_data, std::mt19937::state_size, std::ref(LilEntropy));
  std::seed_seq q(std::begin(seed_data), std::end(seed_data));
  mt.seed(q);
}

int main(){
  std::mt19937 mt;
  SeedGenerator(mt);

  for(int i=0;i<100;i++)
    std::cout<<mt()<<std::endl;
}

The idea here is to use XOR to combine many potential sources of entropy (fast time, slow time, std::random-device, static variable locations, heap locations, function locations, library locations, program-specific values) to make a best-effort attempt at initializing the mt19937. As long as at least once source is "good", the result will be at least that "good".

This answer is not as short as would be preferable and may contain one or more mistakes of logic. So I'm considering it a work in progress. Please comment if you have feedback.

Richard
  • 56,349
  • 34
  • 180
  • 251
  • 3
    The addresses might have very little randomness. You always have the same allocations, so on smaller embedded systems where you have access the the whole memory, it's likely to get the same results each time. I'd say it's likely good enough for a big system, but might do shit on a microcontroller. – meneldal Jul 13 '17 at 02:00
  • 1
    I would guess `&i ^ &myseed` should have considerably less entropy than either one alone, since both are objects with static storage duration in the same translation unit and therefore likely to be rather close together. And you don't seem to actually use the special value from the initialization of `myseed`? – aschepler Jul 13 '17 at 02:57
  • 8
    Converting dealocated pointers to ints is undefined behaviour; do it while it still exists. `^` is a horrible hash combiner; if two values both have lots of entropy, but little compared to each other, it removes it. `+` is usually better (as x+x only burns 1 bit of entropy in x, while x^x burns them all). Function is not thead safe I suspect (`rd()`) – Yakk - Adam Nevraumont Jul 13 '17 at 03:22
  • 2
    Oh and by `+` I mean on unsigned (`+` on signed is UB-bait). While these are somewhat ridiculous UB cases, you did say portable. Also consider getting the address of a function as an integral value if possible (uncertain if it is?) – Yakk - Adam Nevraumont Jul 13 '17 at 03:28
  • 1
    @meneldal: Even on a full-strength PC, although the allocations might get different physical locations (depending on state of the machine external to the process), the pointers are abstracted by the process virtual address space, and likely highly repeatable, particularly is ASLR isn't in effect. – Ben Voigt Jul 13 '17 at 03:42
  • 1
    @Ben that's true, it's just worse on a system where all allocations are predictable. ASLR has been a thing for a while, so I forgot this wasn't always the standard. – meneldal Jul 13 '17 at 04:20
  • I am unaware if `reinterpret_cast(&Entropyinator)` is defined behavior under the C++ standard. Function pointers cannot be converted to `void*`, I don't know if they can be converted to `intptr_t` or not. Prove this one way or the other, or don't use it in portable code. – Yakk - Adam Nevraumont Jul 13 '17 at 14:09
  • @Yakk: XOR's poor performance as a hash combiner was what I was afraid of. Thanks for the addition suggestion. – Richard Jul 13 '17 at 19:53
  • @meneldal: The point of mucking with the addresses and timers is to try to find a best-effort offset such that even if random_device is a PRNG that restarts on each run of the application it will yield different results (provided the memory loads differently). – Richard Jul 13 '17 at 19:55
  • 1
    Don't add signed ints and cause undefined behavior as overflow when I told you it was a problem 6 comments above. – Yakk - Adam Nevraumont Jul 13 '17 at 20:21
  • You should sample the low bits of the high-res clock between every other operation. On x86, the CPU's timestamp counter increments at one count per reference cycle, where a reference cycle is the advertised CPU frequency (not the current turbo or power-saving core clock frequency). Differences in system load and other variations will change the number of cycles that calling other functions takes, especially functions that are being called for the first time in your program since they will probably miss in cache. – Peter Cordes Jul 14 '17 at 00:49
  • 1
    If you can scatter clock-sampling through other parts of your startup code (especially slow parts) before you need to seed your RNG, that's much better. You'll get timestamps with more unpredictable variations in spacing, especially if you make some system calls or especially do any file or network I/O before you need the RNG. – Peter Cordes Jul 14 '17 at 00:55
0
  • Use getentropy() to seed a pseudorandom number generator (PRNG).
  • Use getrandom() if you want random values (instead of, say, /dev/urandom or /dev/random).

These are available on modern UNIX-like systems, such as Linux, Solaris, and OpenBSD.

Dan Anderson
  • 2,265
  • 1
  • 9
  • 20
-2

A given platform might have a source of entropy, such as /dev/random. Nanoseconds since the Epoch with std::chrono::high_resolution_clock::now() is probably the best seed in the Standard Library.

I previously have used something like (uint64_t)( time(NULL)*CLOCKS_PER_SEC + clock() ) to get more bits of entropy for applications that aren’t security-critical.

Davislor
  • 14,674
  • 2
  • 34
  • 49
  • 2
    You really should use `/dev/urandom`, especially in a case like this. `/dev/random` blocks, and often without good reasons for doing so ([insert long explanation about how many different OSes estimate the randomness of the bytes produced by /dev/random]). – Alex Huszagh Jul 13 '17 at 00:46
  • 2
    @AlexanderHuszagh True, although I’ve had to code on systems where `/dev/urandom` didn’t exist, and the alternative to blocking was determinism. A box might have `/dev/hwrng` or `/dev/hw_random` as well, which should be even better. – Davislor Jul 13 '17 at 08:24
  • Okay, I said, “such as `/dev/random`,” and that seems to have sparked a holy war about `/dev/random` versus `/dev/urandom` on Linux that I did not intend when I gave that example.. – Davislor Jul 13 '17 at 20:27