Simple random number generator that can generate nth number in series in O(1) time

Question

I do not intend to use this for security purposes or statistical analysis. I need to create a simple random number generator for use in my computer graphics application. I don't want to use the term "random number generator", since people think in very strict terms about it, but I can't think of any other word to describe it.

it has to be fast.
it must be repeatable, given a particular seed. Eg: If seed = x, then the series a,b,c,d,e,f..... should happen every time I use the seed x.

Most importantly, I need to be able to compute the nth term in the series in constant time.

It seems, that I cannot achieve this with rand_r or srand(), since these need are state dependent, and I may need to compute the nth in some unknown order.

I've looked at Linear Feedback Shift registers, but these are state dependent too.

So far I have this:

int rand = (n * prime₁ + seed) % prime₂

n = used to indicate the index of the term in the sequence. Eg: For first term, n ==1

prime₁ and prime₂ are prime numbers where prime₁ > prime₂

seed = some number which allows one to use the same function to produce a different series depending on the seed, but the same series for a given seed.

I can't tell how good or bad this is, since I haven't used it enough, but it would be great if people with more experience in this can point out the problems with this, or help me improve it..

EDIT - I don't care if it is predictable. I'm just trying to creating some randomness in my computer graphics.

Calc numbers previously, and cache them, once get nth number just take O(1). — Tim, Feb 22 '14 at 05:01
I may not be able to cache them, as there could be millions of numbers, and I need to make at least 400,000 calculations per second on a mobile device. Also, the process of cacheing and lookup could take longer than the actual calculation itself... — Rahul Iyer, Feb 22 '14 at 05:03
Your current algorithm doesn't look random at all. Did you try to make a plot? I've tried to plot using P1=569, P2=359, seed=12345, and the pattern is __really__ visible. Just giving you a heads up. — Paweł Stawarz, Feb 22 '14 at 05:09
I've been using 279470273UL and 4294967291UL , and I haven't had any repeats for at least 10000 calculations... How are you plotting ? — Rahul Iyer, Feb 22 '14 at 05:12
Using Excel. With your numbers the pattern is even __more__ visible. Do an XY-plot in Excel/OpenOffice Calc and you'll see it. The X axis is N, and on the Y axis there's the value. — Paweł Stawarz, Feb 22 '14 at 05:14
Use something with more variety. A sin, cosine, or something. Gonna try something tommorow if I get the chance. — Paweł Stawarz, Feb 22 '14 at 05:21
I have a very similar problem, i.e. map the sequence N = 0,1,2,... to pseudorandom numbers. I've tried using `minstd_rand` the C++ STL, effectively by setting the seed to N and taking the first random number. It turns out that random sequences generated this way are *very* repetitive. These numbers don't have to be very random, having equal probabilities for each consecutive pair would be adequate. — Edmund, Dec 10 '16 at 07:31

score 5 · Answer 1 · answered Feb 22 '14 at 04:53

5

Use a cryptographic block cipher in CTR mode. The Nth output is just encrypt(N). Not only does this give you the desired properties (O(1) computation of the Nth output); it also has strong non-predictability properties.

answered Feb 22 '14 at 04:53

R.. GitHub STOP HELPING ICE

208,859
35
376
711

would it be as fast, or faster than my equation above ? I don't care if it is predictable or not. This is for the purpose of creating some "randomness" in computer graphics. – Rahul Iyer Feb 22 '14 at 04:54
If you weaken the cryptographic block cipher to an extent that it's no longer usable for cryptographic purposes, it might be comparable. This might involve dropping all but one or two rounds of a [Feistel cipher](http://en.wikipedia.org/wiki/Feistel_cipher). In any case I think your proposed algorithm has serious flaws for use in any real-world setting (e.g. if generating images from it, there will be cases where you visually *see* the pattern). – R.. GitHub STOP HELPING ICE Feb 22 '14 at 04:58
BTW note that you can also compute the N+1'th output from the N'th output, but only if you know the encryption key: `next = encrypt(decrypt(prev)+1)`. – R.. GitHub STOP HELPING ICE Feb 22 '14 at 05:00
this sounds really complicated. I don't know anything about block ciphers. But when testing my code above, there were no repeats for at least 10000 calls (I didn't test more), and the problem I had was integer overflow... – Rahul Iyer Feb 22 '14 at 05:02
3

The question isn't about repeats but a visible pattern. For example if you draw a dot at column `out(N) % img_width` in row `N` for each row, and `img_width` has certain numerical relation to `prime2`, you're going to start seeing an obvious pattern... – R.. GitHub STOP HELPING ICE Feb 22 '14 at 05:06
Any idea how to delay the onset of the pattern with minimal changes to my equation ? – Rahul Iyer Feb 22 '14 at 05:18
1

@John if you're really set on that format, try `((n+seed)*(n+seed)*prime1 + (n+seed)*prime2) % prime3`. Block ciphers aren't complicated. Imagine a giant codebook, with an entry for every possible sequence of 16 bytes, and where every codeword is also 16 bytes, randomly chosen (but no two words have the same codeword). You just pick the n'th codeword from the book to get your number. A block cipher is an algorithm for generating the codewords. – user253751 Feb 22 '14 at 06:19
1

or, at a more practical level: `const int KEY = 0x12345678; int numbersGenerated = 0; int generateRandomNumber() {return encrypt(numbersGenerated++, KEY);}`. This is not entirely accurate (the key and encrypted data will be 8, 16 or 32 bytes for most block ciphers - not 4) but it shows the general idea. The key doesn't need to be secret, since you've said security is not required. – user253751 Feb 22 '14 at 06:22
I don't think this answer is very useful -- the question specifically says it's not for cryptographic purposes, so all the potential overhead -- obscure libraries, complicated APIs, relative computational expense -- is unnecessary. I'm looking for the same solution as @John and I'll continue looking before I resort to running a cryptographic hash over a single integer just to get a quick random number out of it. – Edmund Dec 10 '16 at 07:26
1

@Edmund In the 1990s, I used DES with a *reduced number of rounds* as a practical but high-quality PRNG, so the advice given by this answer is sound. [Philox](https://pdfs.semanticscholar.org/38bc/7fc62136ec779d91b86b6e960a06d67b4a97.pdf) is a very fast PRNG, very closely related to the approach proposed by this answer. – njuffa Dec 24 '16 at 23:10

André Harder · Answer 2 · 2016-12-24T23:00:30.340

I stumbled on this a while back, looking for a solution for the same problem. Recently, I figured out how to do it in low-constant O(log(n)) time. While this doesn't quite match the O(1) requested by the author, It may be fast enough (a sample run, compiled with -O3, achieved performance of 1 billion arbitrary index random numbers, with n varying between 1 and 2^48, in 55.7s -- just shy of 18M numbers/s).

First, the theory behind the solution:

A common type of RNGs are Linear Congruential Generators, basically, they work as follows:

random(n) = (m*random(n-1) + b) mod p

Where m and b, and p are constants (see a reference on LCGs for how they are chosen). From this, we can devise the following using a bit of modular arithmetic:

random(0) = seed mod p
random(1) = m*seed + b mod p
random(2) = m^2*seed + m*b + b mod p
...
random(n) = m^n*seed + b*Sum_{i = 0 to n - 1} m^i mod p 
          = m^n*seed + b*(m^n - 1)/(m - 1) mod p

Computing the above can be a problem, since the numbers will quickly exceed numeric limits. The solution for the generic case is to compute m^n in modulo with p*(m - 1), however, if we take b = 0 (a sub-case of LCGs sometimes called Multiplicative congruential Generators), we have a much simpler solution, and can do our computations in modulo p only.

In the following, I use the constant parameters used by RANF (developed by CRAY), where p = 2^48 and g = 44485709377909. The fact that p is a power of 2 reduces the number of operations required (as expected):

#include <cassert>
#include <stdint.h>
#include <cstdlib>

class RANF{

    // MCG constants and state data
    static const uint64_t m = 44485709377909ULL;
    static const uint64_t n = 0x0000010000000000ULL; // 2^48
    static const uint64_t randMax = n - 1;
    const uint64_t seed;
    uint64_t state;

public:

    // Constructors, which define the seed
    RANF(uint64_t seed) : seed(seed), state(seed) { 
        assert(seed > 0 && "A seed of 0 breaks the LCG!"); 
    }

    // Gets the next random number in the sequence
    inline uint64_t getNext(){
        state *= m;
        return state & randMax;
    }

    // Sets the MCG to a specific index
    inline void setPosition(size_t index){
        state = seed;
        uint64_t mPower = m;
        for (uint64_t b = 1; index; b <<= 1){
            if (index & b){
                state *= mPower;
                index ^= b;
            }
            mPower *= mPower;
        }
    }
};

#include <cstdio>
void example(){
    RANF R(1);

    // Gets the number through random-access -- O(log(n))
    R.setPosition(12345); // Goes to the nth random number
    printf("fast nth number = %lu\n", R.getNext());

    // Gets the number through standard, sequential access -- O(n)
    R.setPosition(0);
    for(size_t i = 0; i < 12345; i++) R.getNext();
    printf("slow nth number = %lu\n", R.getNext());  
}

While I presume the author has moved on by now, hopefully this will be of use to someone else.

If you're really concerned about runtime performance, the above can be made about 10x faster with lookup tables, at the cost of compilation time and binary size (it also is O(1) w.r.t the desired random index, as requested by OP)

In the version below, I used c++14 constexpr to generate the lookup tables at compile time, and got to 176M arbitrary index random numbers per second (doing this did however add about 12s of extra compilation time, and a 1.5MB increase in binary size -- the added time may be mitigated if partial recompilation is used).

class RANF{

    // MCG constants and state data
    static const uint64_t m = 44485709377909ULL;
    static const uint64_t n = 0x0000010000000000ULL; // 2^48
    static const uint64_t randMax = n - 1;
    const uint64_t seed;
    uint64_t state;

    // Lookup table
    struct lookup_t{
        uint64_t v[3][65536];

        constexpr lookup_t() : v() {
            uint64_t mi = RANF::m;
            for (size_t i = 0; i < 3; i++){
                v[i][0] = 1;
                uint64_t val = mi;
                for (uint16_t j = 0x0001; j; j++){
                    v[i][j] = val;
                    val *= mi;
                }
                mi = val;
            }
        }
    };  
    friend struct lookup_t;

public:

    // Constructors, which define the seed
    RANF(uint64_t seed) : seed(seed), state(seed) { 
        assert(seed > 0 && "A seed of 0 breaks the LCG!"); 
    }

    // Gets the next random number in the sequence
    inline uint64_t getNext(){
        state *= m;
        return state & randMax;
    }

    // Sets the MCG to a specific index
    // Note: idx.u16 indices need to be adapted for big-endian machines!
    inline void setPosition(size_t index){
        static constexpr auto lookup = lookup_t();  
        union { uint16_t u16[4]; uint64_t u64; } idx;

        idx.u64 = index;
        state = seed * lookup.v[0][idx.u16[0]] * lookup.v[1][idx.u16[1]] * lookup.v[2][idx.u16[2]];
    }
};

Basically, what this does is splits the computation of, for example, m^0xAAAABBBBCCCC mod p, into (m^0xAAAA00000000 mod p)*(m^0xBBBB0000 mod p)*(m^CCCC mod p) mod p, and then precomputes tables for each of the values in the 0x0000 - 0xFFFF range that could fill AAAA, BBBB or CCCC.

score 0 · Answer 3 · answered Feb 22 '14 at 05:09

RNG in a normal sense, have the sequence pattern like f(n) = S(f(n-1))

They also lost precision at some point (like % mod), due to computing convenience, therefore it is not possible to expand the sequence to a function like X(n) = f(n) = trivial function with n only.

This mean at best you have O(n) with that.

To target for O(1) you therefore need to abandon the idea of f(n) = S(f(n-1)), and designate a trivial formula directly so that the N'th number can be calculated directly without knowing (N-1)'th; this also render the seed meaningless.

So, you end up have a simple algebra function and not a sequence. For example:

int my_rand(int n) { return 42; } // Don't laugh!
int my_rand(int n) { 3*n*n + 2*n + 7; }

If you want to put more constraint to the generated pattern (like distribution), it become a complex maths problem.

However, for your original goal, if what you want is constant speed to get pseudo-random numbers, I suggest to pre-generate it with traditional RNG and access with lookup table.

EDIT: I noticed you have concern with a table size for a lot of numbers, however you may introduce some hybrid model, like a table of N entries, and do f(k) = g( tbl[k%n], k), which at least provide good distribution across N continue sequence.

I think a seed could still make a difference. Eg: if 7 was your seed, and you multiplied it by n or something it would. You would get a different sequence compared to using a different seed. — Rahul Iyer, Feb 22 '14 at 05:14
seed is different than parameter to the function. a seed is directly assigning the "last number" used to produce the next sequence. — Non-maskable Interrupt, Feb 22 '14 at 05:17
The conclusion of the first paragraph is false. For a counterexample, look at Andre Harder's answer utilizing a linear congruential generator in O (log n). — Gassa, Dec 24 '16 at 23:32

Christopher Oicles · Answer 4 · 2014-02-22T10:23:28.863

This demonstrates an PRNG implemented as a hashed counter. This might appear to duplicate R.'s suggestion (using a block cipher in CTR mode as a stream cipher), but for this, I avoided using cryptographically secure primitives: for speed of execution and because security wasn't a desired feature.

If we were trying to create a secure stream cipher with your requirement that any emitted sequence be trivially repeatable, given knowledge of its index...

...then we could choose a secure hash algorithm (like SHA256) and a counter with a lot of bits (maybe 2048 -> sequence repeats every 2^2048 generated numbers without reseeding).

HOWEVER, the version I present here uses Bob Jenkins' famous hash function (simple and fast, but not secure) along with a 64-bit counter (which is as big as integers can get on my system, without needing custom incrementing code).

Code in main demonstrates that knowledge of the RNG's counter (seed) after initialization allows a PRNG sequence to be repeated, as long as we know how many values were generated leading up to the repetition point.

Actually, if you know the counter's value at any point in the output sequence, you will be able to retrieve all values generated previous to that point, AND all values which will be generated afterward. This only involves adding or subtracting ordinal differences to/from a reference counter value associated with a known point in the output sequence.

It should be pretty easy to adapt this class for use as a testing framework -- you could plug in other hash functions and change the counter's size to see what kind of impact there is on speed as well as the distribution of generated values (the only uniformity analysis I did was to look for patterns in the screenfuls of hexadecimal numbers printed by main()).

#include <iostream>
#include <iomanip>
#include <ctime>

using namespace std;

class CHashedCounterRng {
    static unsigned JenkinsHash(const void *input, unsigned len) {
        unsigned hash = 0;
        for(unsigned i=0; i<len; ++i) {
            hash += static_cast<const unsigned char*>(input)[i];
            hash += hash << 10;
            hash ^= hash >> 6;
        }
        hash += hash << 3;
        hash ^= hash >> 11;
        hash += hash << 15;
        return hash;
    }

    unsigned long long m_counter;

    void IncrementCounter() { ++m_counter; }

public:
    unsigned long long GetSeed() const {
        return m_counter; 
    }
    void SetSeed(unsigned long long new_seed) {
        m_counter = new_seed; 
    }
    unsigned int operator ()() {
        // the next random number is generated here
        const auto r = JenkinsHash(&m_counter, sizeof(m_counter));
        IncrementCounter();
        return r;
    }

    // the default coontructor uses time() 
    // to seed the counter
    CHashedCounterRng() : m_counter(time(0)) {}

    // you can supply a predetermined seed here, 
    // or after construction with SetSeed(seed)
    CHashedCounterRng(unsigned long long seed) : m_counter(seed) {}
};

int main() {
    CHashedCounterRng rng;
    // time()'s high bits change very slowly, so look at low digits
    // if you want to verify that the seed is different between runs
    const auto stored_counter = rng.GetSeed();
    cout << "initial seed: " << stored_counter << endl;
    for(int i=0; i<20; ++i) {
        for(int j=0; j<8; ++j) {
            const unsigned x = rng();
            cout << setfill('0') << setw(8) << hex << x << ' ';
        }
        cout << endl;
    }
    cout << endl;

    cout << "The last line again:" << endl;
    rng.SetSeed(stored_counter + 19 * 8);
    for(int j=0; j<8; ++j) {
        const unsigned x = rng();
        cout << setfill('0') << setw(8) << hex << x  << ' ';
    }
    cout << endl << endl;
    return 0;
}

I noticed that you were checking for repeated outputs as a measure of randomness; keep in mind that a truly random number generator will always emit repeat values, because every bit in its output stream is independent of every other bit. In fact, after a certain number of observations with no repeating outputs, evidence will grow that the stream you are looking at does NOT contain random values. — Christopher Oicles, Feb 22 '14 at 10:00

Simple random number generator that can generate nth number in series in O(1) time

4 Answers4