4

Recent Intel chips (Ivy Bridge and up) have instructions for generating (pseudo) random bits. RDSEED outputs "true" random bits generated from entropy gathered from a sensor on the chip. RDRAND outputs bits generated from a pseudorandom number generator seeded by the true random number generator. According to Intel's documentation, RDSEED is slower, since gathering entropy is costly. Thus, RDRAND is offered as a cheaper alternative, and its output is sufficiently secure for most cryptographic applications. (This is analogous to the /dev/random versus /dev/urandom on Unix systems.)

I was curious about the performance difference between the two instructions, so I wrote some code to compare them. To my surprise, I find there is virtually no difference in performance. Could anyone provide an explanation? Code and system details follow.

Benchmark

/* Compare the performance of RDSEED and RDRAND.
 *
 * Compute the CPU time used to fill a buffer with (pseudo) random bits 
 * using each instruction.
 *
 * Compile with: gcc -mdrnd -mdseed
 */
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <x86intrin.h>

#define BUFSIZE (1<<24)

int main() {

  unsigned int ok, i;
  unsigned long long *rand = malloc(BUFSIZE*sizeof(unsigned long long)), 
                     *seed = malloc(BUFSIZE*sizeof(unsigned long long)); 

  clock_t start, end, bm;

  // RDRAND (the benchmark)
  start = clock();
  for (i = 0; i < BUFSIZE; i++) {
    ok  = _rdrand64_step(&rand[i]);
  }
  bm = clock() - start;
  printf("RDRAND: %li\n", bm);

  // RDSEED
  start = clock();
  for (i = 0; i < BUFSIZE; i++) {
    ok = _rdseed64_step(&seed[i]);
  }
  end = clock();
  printf("RDSEED: %li, %.2lf\n", end - start, (double)(end-start)/bm);

  free(rand);
  free(seed);
  return 0;
}

System details

  • Intel Core i7-6700 CPU @ 3.40GHz
  • Ubuntu 16.04
  • gcc 5.4.0
tweaksp
  • 601
  • 5
  • 14

3 Answers3

6

You aren't checking the return value, so you don't how many actual random numbers you have generated. With retry, as Florian suggested the RDSEED version is more than 3 times slower:

RDRAND: 1989817
RDSEED: 6636792, 3.34 

Under the covers, the hardware entropy source probably generates only at a limited rate, and this causes RDSEED to fail when called at a rate faster than the entropy can regenerate. RDRAND, on the other hand, is only generating a pseudo-random sequence based on periodic re-seeding, so it is unlikely to fail.

Here is the modified code excerpt:

  // RDRAND (the benchmark)
  start = clock();
  for (i = 0; i < BUFSIZE; i++) {
    while (!_rdrand64_step(&rand[i]))
        ;
  }
  bm = clock() - start;
  printf("RDRAND: %li\n", bm);

  // RDSEED
  start = clock();
  for (i = 0; i < BUFSIZE; i++) {
    while (!_rdseed64_step(&seed[i]))
        ;
  }
  end = clock();
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
4

For me, on a Core m7-6Y75, the RDSEED in your test program occasionally fails (I added two assert (ok);s, and the second one fails occasionally). Correct code would retry, resulting in a performance difference in favor of RDRAND. (Retrying is required for RDRAND as well, but it does not seem to happen in practice, so RDRAND is faster.)

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
  • Interesting! I assumed, naively, that `_rdseed64_step()` would block until sufficient entropy was available. (Just as reading from /dev/random blocks until entropy is available.) Thanks! – tweaksp Jul 27 '17 at 13:29
  • @tweaksp and Florian: See also [What are the exhaustion characteristics of RDRAND on Ivy Bridge?](https://stackoverflow.com/questions/14413839/what-are-the-exhaustion-characteristics-of-rdrand-on-ivy-bridge) where David Johnston (Intel RNG HW designer and `librdrand` author) posted some interesting stuff. e.g. that the actual implementation in IvyBridge never underflows its buffer, but it's not guaranteed that future CPUs will be that way. – Peter Cordes Jul 28 '17 at 01:12
1

Interesting - in my case with 3.6 GHz 10-Core Intel Core i9 (on iMac), with the above program (corrected to repeat RDRAND/RDSEED call in case of failure) I observe:

$ ./rdseed-test 
RDRAND: 1751837
RDSEED: 1752472, 1.00

Update

I must admit that I'm puzzled - trying this same executable a few days later gives me 3x difference, like the one reported by others above:

$ ./rdseed-test 
RDRAND: 1761312
RDSEED: 5309609, 3.01

No idea why sometimes RDSEED runs as fast as RDRAND, and sometimes - three times slower.

Mouse
  • 542
  • 6
  • 9
  • Perhaps they beefed up the HW RNG and preprocessing logic so it can once again fully keep up with a single core, so you'd need multiple cores pulling random numbers to exhaust it. What generation of i9 was it? Ice Lake, or a Skylake-derived CPU? – Peter Cordes Nov 10 '21 at 01:10
  • @Peter, it's almost one year old, so must be a Skylake-derivative. But see the update of my post - now I'm getting that 3x perfromance difference more often than not. – Mouse Nov 14 '21 at 04:20
  • Probably a Comet Lake i9, then, like i9-10910 since that's the only Comet Lake with 10 cores and 3.6GHz non-turbo base clock. https://en.wikipedia.org/wiki/Comet_Lake_(microprocessor)#Desktop_processors Ice Lake has been out for over a year, but only mobile versions and no i9 or 10 cores. There are Tiger Lake i9 CPUs, but none with 10 cores. – Peter Cordes Nov 14 '21 at 17:03
  • IDK why you'd see varying results like that, unless the buffer is really huge compared to the amount of entropy this benchmark pulls from it... – Peter Cordes Nov 14 '21 at 17:04
  • 1
    The speed of the RNG varies with different chips. Generally low power SoCs have slower entropy sources (resulting from the lower system voltage). A 10 core I9 is faster. RdRand is faster than RdSeed, but with no congestion on the bus, you might be limited by the round trip time per request and so see the same throughput whereas the difference will be seen with a set of parallel processes pulling at the same time. You might be having the OS or some other programs. When testing timing we don't write it to memory - we xor it with a running value to remove the memory timing from the result. – David Johnston Nov 27 '21 at 22:48