6

I'm looking for a efficient, uniformly distributed PRNG, that generates one random integer for any whole number point in the plain with coordinates x and y as input to the function.

int rand(int x, int y)

It has to deliver the same random number each time you input the same coordinate.

Do you know of algorithms, that can be used for this kind of problem and also in higher dimensions?

I already tried to use normal PRNGs like a LFSR and merged the x,y coordinates together to use it as a seed value. Something like this.

int seed = x << 16 | (y & 0xFFFF)

The obvious problem with this method is that the seed is not iterated over multiple times but is initialized again for every x,y-point. This results in very ugly non random patterns if you visualize the results.

I already know of the method which uses shuffled permutation tables of some size like 256 and you get a random integer out of it like this.

int r = P[x + P[y & 255] & 255];

But I don't want to use this method because of the very limited range, restricted period length and high memory consumption.

Thanks for any helpful suggestions!

bakkaa
  • 673
  • 6
  • 25

3 Answers3

8

I found a very simple, fast and sufficient hash function based on the xxhash algorithm.

// cash stands for chaos hash :D
int cash(int x, int y){   
    int h = seed + x*374761393 + y*668265263; //all constants are prime
    h = (h^(h >> 13))*1274126177;
    return h^(h >> 16);
}

It is now much faster than the lookup table method I described above and it looks equally random. I don't know if the random properties are good compared to xxhash but as long as it looks random to the eye it's a fair solution for my purpose.

This is what it looks like with the pixel coordinates as input:

enter image description here

bakkaa
  • 673
  • 6
  • 25
  • The building of *good* hash-functions is a hard process. And yours look very very bad. First of all, i suspect there are overflows in h, second: this is a result of same inputs with different seeds: 12980285595313403384 / 12980285592294613320. And this is a result of similar, but different inputs: 12709661718304170483 / 13558132204872960782. This is not a hash-function at all (if one want's an avalanche-effect an co.). If it's working for you, fine. But this is even worse than a linear congruental generator (and probably not faster). – sascha May 16 '16 at 14:01
  • 4
    @sascha I've added a plot image of my function. Looks pretty random to the eye. My algorithm works similar to xxhash just with less twists to gain performance. xxhash internally works with overflows, too. https://github.com/Cyan4973/xxHash/blob/master/xxhash.c For example from line 370 to 375. It isn't a flaw of the algorithm. I cannot use generators like the LCG because they are designed to compute one random number after the other in a sequence. In the GPU I have to compute them in parallel. – bakkaa May 17 '16 at 03:08
  • I appreciate this, it's pretty helpful for adding a bit of noise to an antialiasing hack I'm doing. looked around a bit for non-cryptographic hash functions that are fast on gpu, and this is pretty good for that even if it does fail tests. – lahwran Aug 31 '16 at 04:25
  • @lahwran Thank you. I appreciate, that it is helpful. – bakkaa Sep 02 '16 at 06:29
3

My approach

In general i think you want some hash-function (mostly all of these are designed to output randomness; avalanche-effect for RNGs, explicitly needed randomness for CryptoPRNGs). Compare with this thread.

The following code uses this approach:

  • 1) build something hashable from your input
  • 2) hash -> random-bytes (non-cryptographically)
  • 3) somehow convert these random-bytes to your integer range (hard to do correctly/uniformly!)

The last step is done by this approach, which seems to be not that fast, but has strong theoretical guarantees (selected answer was used).

The hash-function i used supports seeds, which will be used in step 3!

import xxhash
import math
import numpy as np
import matplotlib.pyplot as plt
import time

def rng(a, b, maxExclN=100):
    # preprocessing
    bytes_needed = int(math.ceil(maxExclN / 256.0))
    smallest_power_larger = 2
    while smallest_power_larger < maxExclN:
        smallest_power_larger *= 2

    counter = 0
    while True:
        random_hash = xxhash.xxh32(str((a, b)).encode('utf-8'), seed=counter).digest()
        random_integer = int.from_bytes(random_hash[:bytes_needed], byteorder='little')
        if random_integer < 0:
            counter += 1
            continue # inefficient but safe; could be improved
        random_integer = random_integer % smallest_power_larger
        if random_integer < maxExclN:
            return random_integer
        else:
            counter += 1

test_a = rng(3, 6)
test_b = rng(3, 9)
test_c = rng(3, 6)
print(test_a, test_b, test_c) # OUTPUT: 90 22 90

random_as = np.random.randint(100, size=1000000)
random_bs = np.random.randint(100, size=1000000)

start = time.time()
rands = [rng(*x) for x in zip(random_as, random_bs)]
end = time.time()

plt.hist(rands, bins=100)
plt.show()
print('needed secs: ', end-start)
# OUTPUT: needed secs:  15.056888341903687 -> 0,015056 per sample
# -> possibly heavy-dependence on range of output

Possible improvements

  • Add additional entropy from some source (urandom; could be put into str)
  • Make a class and initialize to memorize preprocessing (costly if done for each sampling)
  • Handle negative integers; maybe just use abs(x)

Assumptions:

  • the ouput-range is [0, N) -> just shift for others!
  • the output-range is smaller (bits) than the hash-output (may use xxh64)

Evaluation:

Check randomness/uniformity

1D-Histogram of output -> looks good 2D-Representation -> looks good

Check if deterministic regarding input

2D-Representation with equal input-vectors

Community
  • 1
  • 1
sascha
  • 32,238
  • 6
  • 68
  • 110
  • Thanks for your answer. Do you think I could use genetic algorithms to find a suitable hash function? I want to have a really fast hash function that is faster than lookup tables in the gpu,so I can only use simple bitwise operations. The random numbers don't need to be cryptographically secure they just need to look random to the eye. So I thought I can measure the distribution properties and period length of different algoritms encoded in DNA and crossbreed and/or mutate good algorithms to find a good solution after several generations. – bakkaa May 11 '16 at 02:15
  • 1
    Definitly not! xxhash, which i used, is a non-crypto hash-function which is as fast as your memory. You won't beat that without a lot of research. Generating code with GAs is also very hard, even for structured-output, much harder for wanted chaotic output. Just stick to a fast hash-function. The are 2 remaining questions: is there a faster one for GPUs (looks like you target these)? And much more important: how to implement step 3 on the GPU. My approach is not fast and will even be slower on the GPU. You have to decide how much quality you need. Simpler schemes like modulo could be enough. – sascha May 11 '16 at 09:39
  • I've implemented a simplified version of xxhash on the gpu. with just one integer input instead of a byte array, much of the complexity disappears. I simply converted my integer inputs x and y to one with this codeThe randomness is really good, but it is slightly slower than the use of lookup tables. So I'll try to find a simpler hash function that uses less multiplications. If I find one I will post it here. Thanks for your helpful answers. – bakkaa May 14 '16 at 00:38
  • edit of the previous comment: So I simply did this to my inputs '(x << 16) | y', because it was the fastest way I could come up with. The downside is, that x and y are limited to 16 bit values because if they go higher many collisions would occur. In 3 or 4 dimensions it would limit my input range too much I think. – bakkaa May 14 '16 at 00:48
  • I found a hash function. what do you think about it? The results are good for me. – bakkaa May 16 '16 at 12:05
1

You can use various randomness extractors to achieve your goals. There are at least two sources you can look for a solution.

All in all, you can preferably use:

  1. AES-CBC-MAC using a random key (may be fixed and reused)
  2. HMAC, preferably with SHA2-512
  3. SHA-family hash functions (SHA1, SHA256 etc); using a random final block (eg use a big random salt at the end)

Thus, you can concatenate your coordinates, get their bytes, add a random key (for AES and HMAC) or a salt for SHA and your output has an adequate entropy. According to NIST, the output entropy relies on the input entropy:

Assuming you use SHA1; thus n = 160bits. Let's suppose that m = input_entropy (your coordinates' entropy)

  • if m >= 2n then output_entropy=n=160 bits
  • if 2n < m <= n then maximum output_entropy=m (but full entropy is not guaranteed).
  • if m < n then maximum output_entropy=m (this is your case)

see NIST sp800-90c (page 11)

Kostas Kryptos
  • 4,081
  • 2
  • 23
  • 24
  • "According to NIST, the output entropy relies on the input entropy" - This is a powerful remark! This would mean, that my approach (and in general all approaches without an external entropy source) could struggle if inputs have low-entropy (and maybe a high-output range; at least is is more observable then regarding gaps). Depending on the setting, adding additional entropy is worth to be considered. – sascha May 10 '16 at 15:11