3

I was testing the speed of AMD 5600 h/w random number generator (rdrand) in C++ and found the speed isn't steady. Is this normal or am I doing something wrong ?

Here is the used code:

#include <iostream>
#include <chrono>
#include <cstdint>

int main()
{
    uint64_t random_num;
    int iter = 5'000'000;

    auto start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < iter; i++) {
        __asm__ volatile("rdrand %0" : "=r"(random_num));
    }

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();

    std::cout << iter << " iterations in " << duration << " ms" << std::endl;


    return 0;
}

Here are the results:

mika@pc3 ~/t $ g++ -O2 rdrand.cpp -o rdrand
mika@pc3 ~/t $ ./rdrand
5000000 iterations in 79 ms
mika@pc3 ~/t $ ./rdrand
5000000 iterations in 79 ms
mika@pc3 ~/t $ ./rdrand
5000000 iterations in 79 ms
mika@pc3 ~/t $ ./rdrand
5000000 iterations in 4458 ms
mika@pc3 ~/t $ ./rdrand
5000000 iterations in 4251 ms
mika@pc3 ~/t $ ./rdrand
5000000 iterations in 4312 ms
mika@pc3 ~/t $ ./rdrand
5000000 iterations in 4209 ms
mika@pc3 ~/t $ ./rdrand
5000000 iterations in 4571 ms
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
mlauronen
  • 31
  • 3
  • 2
    This is normal. RNGs rely on entropy in your computer. When that entropy is used up it takes a while to accumulate again. – john May 20 '23 at 06:52
  • 1
    .... which is why it's usually better to use a _pseudo_ random number generator. Use a generator that uses up some entropy to _initialize_ the pseudo random number generator only. – Ted Lyngmo May 20 '23 at 06:54
  • 1
    Is it consistent that the first "slow" case is always fully slow? Like you never an in-between time? If the explanation was that it ran out of entropy, you'd expect that to happen in the middle of a test run and see a time like 1000 to 3000 ms that was clearly different from either the fast or slow tests. Do waiting for some time always make it fast again? How long? – Peter Cordes May 20 '23 at 06:59
  • @john: if that explanation was all there is to it, that would require the RDRAND hardware to have a buffer size of over 5 million 64-bit qwords, or 40 MB ~= 38.1MiB. That's more than the total L3 cache size (32 MiB) on the chip. – Peter Cordes May 20 '23 at 07:01
  • @PeterCordes I have very little knowledge of this, but I doubt that the system is actively monitoring the amount of entropy present (this doesn't even seem possible). Instead I imagine there is an algorithm that knows how much entropy is present under normal circumstances, knows how many random numbers that can generate, and knows how quickly entropy accumulates. – john May 20 '23 at 07:07
  • 1
    What's the status in CF on the fast vs. slow runs? CF=0 means failure (e.g. exhaustion of the available entropy). https://uops.info/table.html reports very slow performance for RDRAND on AMD before Zen 4 (with speed being twice as slow for 64-bit as for 32-bit operand-size). On Intel, microcode updates introduced very bad performance (https://www.phoronix.com/news/RdRand-3-Percent - as bad as 3% of original speed, like one per 1.3k clock cycles according to uops.info) because of some possible data leak or side channel or something; perhaps something similar happened with earlier Zen? – Peter Cordes May 20 '23 at 07:08
  • 1
    @john: There is buffering and digital "whitening" after the hardware RNG stage. In [Intel's design for example](https://crypto.stackexchange.com/a/102168/25899), it definitely knows how many times the meta-stable flip flop has decayed to one state or the other, thus how many raw input bits. It wouldn't be hard to track a count of those bits. In any case, if it knows or thinks it's out of entropy, it should still return with CF=0 (and register value = 0), rather than stalling for a long time. – Peter Cordes May 20 '23 at 07:12
  • 80 ms vs. 4.4 seconds is a way bigger speed ratio than CPU clock frequency could explain, but I assume you ran these back to back to the CPU stayed at near its max boost frequency the whole time? – Peter Cordes May 20 '23 at 07:15
  • I reduced the number of iterations to 1e3 and now it's changing the speed almost every other time. So now it happens more often. It's giving me random values no matter is it slow or fast. – mlauronen May 20 '23 at 10:59
  • So CF was set every time? How did you check that, e.g. `adc` into another variable to count successful rdrand executions, and check that against the repeat count? – Peter Cordes May 20 '23 at 12:14
  • @mlauronen: 1e3 is so short it's probably hard to time, like not much above the measurement noise of an empty timed region, especially if you aren't doing any kind of warm-up for the CPU frequency by running some work other than rdrand. Also, don't forget to @ notify people you're replying to. – Peter Cordes May 20 '23 at 12:16
  • @PeterCordes Yes the Carry flags is always set (EFLAGS/RFLAGS=0x203). I am counting them while iterating. – mlauronen May 20 '23 at 13:26
  • 1
    If anyone is interested in the code, it can be found here https://gitlab.com/-/snippets/2544848 – mlauronen May 20 '23 at 13:35
  • `pushf` isn't safe in inline asm unless you compile with `-mno-red-zone`. Although you should be fine in this case since it's not in a leaf function. That's still one of the least efficient ways you could get the CF output, though. `setc %b1` would be the obvious simplistic to get a boolean output (that the compiler will have to zero-extend and/or booleanize depending on what type you tell it), or take a `"+r"(flags)` output and `adc $0, %1`. Or GCC6 and later lets you `"=@ccc"(foo)` to ask for the C flag as a boolean output. – Peter Cordes May 20 '23 at 13:47
  • But you should be fine; `pushf` isn't slow like `popf`, and the compiler hopefully doesn't actually make branchy code for that `if`. And if it did, the branch would predict well anyway. So this code won't slow down the loop. – Peter Cordes May 20 '23 at 13:48
  • Yes it's faster to use setc. I checked the code in debugger with the pushf/pop and it's ok. – mlauronen May 20 '23 at 14:25

0 Answers0