Why do we have instructions such as RDRAND instead of an I/O which would gives us similar results?

Question

I'm wondering what was the reason behind designing a CPU specific instruction to generate random numbers?

The Intel processor has RDRAND and RDSEED. The PPC also has an equivalent instruction.

Wouldn't it make more sense to have a separate chip and just do some I/O to get those numbers? It seems to me that it makes the CPU even more complex for a very specialized instruction (most software never use random numbers!) when I/O has been around for ages and should work just fine.

“Use a separate chip” - that would cost more, add to complexity, and be slower (as you now need a communications channel/bus or memory-map your proposed random chip) that’s a deal-breaker for applications that need fast random numbers. Just because not every application needs random numbers (I’d wager most today *do* due to HTTPS encryption, it’s just provided by the OS or TLS lib) doesn’t mean it’s useless - think about Intel adding AES instructions - or SSE which you can’t even use from most languages without a lot of difficulty or crazy-smart compilers. — Dai, Sep 11 '19 at 02:16
a cache miss costs hundreds or thousands of cycles, so using a separate chip for random number generation would be even a lot more slower, and block the data bus unnecessarily for sometime. The random number generator is just tiny compared to billions of transistors, so it'd be a good idea including them, provided that all encryption applications need it — phuclv, Sep 11 '19 at 07:24
There is never anything pretty about I/O, nobody likes to support the driver that is required to allow a userland program to access the device. Not just one, multiply by the number of OS implementations and versions. The relentless cost-cutting in hardware is achieved by tighter integration, it didn't stop at just the chipset. — Hans Passant, Sep 11 '19 at 07:43

score 8 · Accepted Answer · answered Sep 11 '19 at 07:31

We have both.
The TPM can generates crypto-secure random numbers (after all, it's a "crypto-chip") and the TPM is present on many, if not all, Intel based motherboards since Haswell.
Proprietary CSRNG PCI(e) cards are commercially available too.

I once attended a presentation of an home-made CSRNG with Arduino.
That guy had no notion of statistics or algebra. To be honest, the whole presentation was pathetic.
You can't just make a chip and claim it is a CSRNG, you must gain certifications, there are standards and methods.
Getting those certifications is expensive and hard.

Furthermore, to handle a wide bandwidth you also need a fast (u)-processor.
One of the goal of the TPM commitee was to make it cheap, the end result was that TPM chips are slow.

If you add the, relatively, low market demand for such chips we can clearly see that CSRNG chip are indeed expensive.

External devices are also prone to physical attacks, the chip can be easily desoldered/decapped or the bus tapped or replaced.
This is true even for the CSRNG inside the CPU, attacks are known, one reduce its entropy by altering its transistors.
However that requires a whole different kind of tools.

PCI(e) CSRNGs will probably use DMA to transfer a required number of bytes of entropy, that requires some coordination interface with the OS to known, for example, when a transfer is in progress.

And, of course, the payload would be in memory and that means a bigger software surface attack and an extra step to get it in the registers.
Accessing the memory is in the order of ~200-300 cycles.

Using port-mapped IO (i.e. the in instruction) will bring the payload directly in a register, but only 32-bit at a time and it's no faster than an ordinary load.

RDRAND is an user-mode instruction, allowing user-mode applications access to a CSRNG without any extra burden but to check for its support.
It comes with almost all of the recent CPUs, it almost feels like it's free.

Finally, there is a marketing aspect playing.
If your manufacturing process has improved and has given you a few [insert-a-reasonable-length-unit-here] squared of space in the die, you can improve the micro-architecture or add a new feature.
The former is hard, the latter is relatively easy to design and it may give you a boost over your competitors: common tools run faster on your CPU and that just because you could afford more space in the die.

score 4 · Answer 2 · answered May 12 '21 at 05:03

4

It is an explicit design decision in RdRand and RdSeed that it delivers its random number into the target register of the program executing the instruction, bypassing the OS, libraries, hypervisors, device drivers and anything else that may present an increased attack surface.

Having the RNG on die was (a) the obvious thing to do since it's only a tiny sliver of the whole die and (b) the right thing from a security point of view, preventing simple probing of the path from the RNG to the running code.

answered May 12 '21 at 05:03

David Johnston

976
1
8
10

So in other words, the fundamental problem of having the RNG I/O mapped, regardless of where the hardware is actually located, is that I/O is *privileged*. – Nate Eldredge May 12 '21 at 21:14
That is one aspect. A slighter deeper way of looking at this is that if's it's privileged access, your hypervisor is totally going to mess things up when you try to access a non-deterministic primitive. Similarly, operating systems have done a terrible job of making secure random numbers available (urandom?). Do it once right, put it on the chip and bypass the interlopers. – David Johnston Jun 02 '21 at 16:59

Why do we have instructions such as RDRAND instead of an I/O which would gives us similar results?

2 Answers2