Populate a region of memory with random data

Question

I am trying to populate a region of data with randomness using rand cross-platform in an efficient way. Here is what I have so far:

/**********************************************\
*          Useful macro by Glen Ragen          *
* https://stackoverflow.com/a/39920811/5223757 *
\**********************************************/
#if 1 /* So that the IDE minifies it */
#define NEEDS_BIT(N, B)     (((unsigned long)N >> B) > 0)

#define BITS_TO_REPRESENT(N)                   \
        (NEEDS_BIT(N,  0) + NEEDS_BIT(N,  1) + \
         NEEDS_BIT(N,  2) + NEEDS_BIT(N,  3) + \
                         ...
         NEEDS_BIT(N, 60) + NEEDS_BIT(N, 61) + \
         NEEDS_BIT(N, 62) + NEEDS_BIT(N, 63)   \
        )
#endif /* So that the IDE minifies it */

typedef struct {
    size_t size; /* Size in bytes */
    void *pointer;
} data;

void fill_data_with_randomness(const data data) {
    for (size_t biti = 0; biti < data.size * 8; biti += BITS_TO_REPRESENT(RAND_MAX)) {
        /* Fill data.pointer with bits from rand() */
    }
}

As RAND_MAX is a compile-time constant, BITS_TO_REPRESENT(RAND_MAX) also should be. As data is of type const data, data.size * 8 should be able to be optimised to a call-time constant and not evaluated every iteration of the for loop.

However, bit manipulation is quite slow and the fill_data_with_randomness function will be called very frequently. This function should compile for and run correctly on systems with any value of RAND_MAX of form 2^n-1. What fill_data_with_randomness function can quickly fill this region of memory with rand()mness without wasting bits?

Why does it need to be bit by bit? If the size of your data is a multiple of the size of the return value of `rand()`, could you not assign the return value of `rand()` to the data piece by piece? Or if it's byte-aligned, assign the return value of `rand()&0xFF`? — Govind Parmar, Jun 02 '17 at 13:56
@GovindParmar I'll edit the question to make it clear. I was looking for a way that was _faster_ than naive bit-by-bit - if possible without wasting `rand()` bits as `&0xFF` would. However, if a `rand()` call per byte was more efficient than any other solution I'd go with that. — wizzwizz4, Jun 02 '17 at 14:15

rici · Accepted Answer · 2017-06-02T14:27:00.073

RAND_MAX is guaranteed to be at least 2¹⁵−1 so there is no problem filling an entire byte with randomness with a single call to rand(), unless your architecture has bytes larger than 15 bits (in which case your hard-coded multiplication by 8 will also be problematic).

Doing the generation byte by byte is straight-forward and requires no bit-hacking cleverness.

Personally, I wouldn't worry about "wasting bits" from a call to rand(); the bits produced by that standard library function aren't worth much on most implementations. The real problem is to decide which bits to throw away so that the remaining ones are reasonably well-distributed. If possible, to cope with the range of inferior PRNGs generally used, you should go for the middle bits.

A better solution might be to simply provide your own PRNG; portable source code implementations exist, and some are well studied.

Well, this is an anti-climax. I suppose there was a reason that all of the other code samples I looked at were using some variation on `&0xFF` or `%256`. — wizzwizz4, Jun 02 '17 at 14:22
@wizzwizz4: as I mentioned in the edit, I'd suggest something like `rand()/8%256` to avoid the low order bits. — rici, Jun 02 '17 at 14:29

Populate a region of memory with random data

1 Answers1