40

I'm looking to get a random number in OpenCL. It doesn't have to be real random or even that random. Just something simple and quick.

I see there is a ton of real random parallelized fancy pants random algorithms in OpenCL that are like thousand and thousands of lines. I do NOT need anything like that. A simple 'random()' would be fine, even if it is easy to see patterns in it.

I see there is a Noise function? Any easy way to use that to get a random number?

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • 3
    You could pass an array with some random numbers to your kernel, wouldn't that be the most simple? – eudoxos Mar 28 '12 at 20:01
  • That's true, but then your are going back and forth from CPU->GPU. The ENTIRE point of using OpenCL is to stay on the GPU. – user697111 Mar 28 '12 at 23:53
  • That depends on the application and for some type of tasks (even within the same problem) CPU might be still good. Depends how much time you have, generating rands on CPU fast for development. – eudoxos Mar 31 '12 at 09:35
  • note to others: many answers here have mostly evolved beyond the simple non-parallel-fancy pants random algorithms that the OP would've like to avoid - however libraries exist now to generate random numbers so this so it is worth consideration of including them or at least seeing how they approached it when you face the common problem of needing random numbers in OpenCL. See my post below for details. – Jason Newton May 08 '15 at 03:35

9 Answers9

17

I was solving this "no random" issue for last few days and I came up with three different approaches:

  1. Xorshift - I created generator based on this one. All you have to do is provide one uint2 number (seed) for whole kernel and every work item will compute his own rand number

    // 'randoms' is uint2 passed to kernel
    uint seed = randoms.x + globalID;
    uint t = seed ^ (seed << 11);  
    uint result = randoms.y ^ (randoms.y >> 19) ^ (t ^ (t >> 8));
    
  2. Java random - I used code from .next(int bits) method to generate random number. This time you have to provide one ulong number as seed.

    // 'randoms' is ulong passed to kernel
    ulong seed = randoms + globalID;
    seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1);
    uint result = seed >> 16;
    
  3. Just generate all on CPU and pass it to kernel in one big buffer.

I tested all three approaches (generators) in my evolution algorithm computing Minimum Dominating Set in graphs.

I like the generated numbers from the first one, but it looks like my evolution algorithm doesn't.

Second generator generates numbers that has some visible pattern but my evolution algorithm likes it that way anyway and whole thing run little faster than with the first generator.

But the third approach shows that it's absolutely fine to just provide all numbers from host (cpu). First I though that generating (in my case) 1536 int32 numbers and passing them to GPU in every kernel call would be too expensive (to compute and transfer to GPU). But it turns out, it is as fast as my previous attempts. And CPU load stays under 5%.

BTW, I also tried MWC64X Random but after I installed new GPU driver the function mul_hi starts causing build fail (even whole AMD Kernel Analyer crashed).

icl7126
  • 5,740
  • 4
  • 53
  • 51
  • Should there be any problem, because this is used from multiple threads? Isn't there a race condition when updating the seed? https://stackoverflow.com/questions/71155563/opencl-pseudo-random-generator-race-condition – Dávid Tóth Feb 17 '22 at 12:07
  • It's been 9 years since I worked with OpenCL :), but if I remember correctly, each worker has a private memory which holds all primitive values like these - so no race condition can happen. Then there is memory of whole cluster where you need to use offset to read or write and then there is global memory (the big one) where you again need to use offset. See the [OpenCL Memory Model](https://www.xilinx.com/html_docs/xilinx2017_4/sdaccel_doc/bop1504034296858.html) for more info. – icl7126 Feb 17 '22 at 12:29
  • ah so in this implementation the random state(s) are in local workspace then? I'M asking because of my question I linked I use a global workspace related random logic.. – Dávid Tóth Feb 17 '22 at 13:50
  • @DavidTóth looking at your question, I really can't tell. I'm no expert in this area anymore, I've used this 9 years ago (!!!). Anyway, in general, locally created variables (like numbers) are private for each worker. But if you want to write the result to the global memory, each worker has to use a different location in that memory - that's why the offset with worker id is used. If the memory is shared between workers, there is always a race condition if any of the workers can modify it. – icl7126 Feb 17 '22 at 17:47
14

the following is the algorithm used by the java.util.Random class according to the doc:

(seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1)

See the documentation for its various implementations. Passing the worker's id in for the seed and looping a few time should produce decent randomness

or another metod would be to have some random operations occur that are fairly ceratain to overflow:

 long rand= yid*xid*as_float(xid-yid*xid);
 rand*=rand<<32^rand<<16|rand;
 rand*=rand+as_double(rand);

with xid=get_global_id(0); and yid= get_global_id(1);

vandale
  • 3,600
  • 3
  • 22
  • 39
  • 1
    The above (java random) has a dramatically high correlation between seed[n] and seed[n-1]. I found out when building a dice simulator. Take the output above, and do modulo 6. You will never get two 5s next to each other! – crusaderky May 20 '14 at 20:12
12

I am currently implementing a Realtime Path Tracer. You might already know that Path Tracing requires many many random numbers.
Before generating random numbers on the GPU I simply generated them on the CPU (using rand(), which sucks) and passed them to the GPU.
That quickly became a bottleneck.
Now I am generating the random numbers on the GPU with the Park-Miller Pseudorandom Number Generator (PRNG).
It is extremely simple to implement and achieves very good results.
I took thousands of samples (in the range of 0.0 to 1.0) and averaged them together.
The resulting value was very close to 0.5 (which is what you would expect). Between different runs the divergence from 0.5 was around 0.002. Therefore it has a very uniform distribution.

Here's a paper describing the algorithm:
http://www.cems.uwe.ac.uk/~irjohnso/coursenotes/ufeen8-15-m/p1192-parkmiller.pdf
And here's a paper about the above algorithm optimized for CUDA (which can easily be ported to OpenCL): http://www0.cs.ucl.ac.uk/staff/ucacbbl/ftp/papers/langdon_2009_CIGPU.pdf

Here's an example of how I'm using it:

int rand(int* seed) // 1 <= *seed < m
{
    int const a = 16807; //ie 7**5
    int const m = 2147483647; //ie 2**31-1

    *seed = (long(*seed * a))%m;
    return(*seed);
}

kernel random_number_kernel(global int* seed_memory)
{
    int global_id = get_global_id(1) * get_global_size(0) + get_global_id(0); // Get the global id in 1D.

    // Since the Park-Miller PRNG generates a SEQUENCE of random numbers
    // we have to keep track of the previous random number, because the next
    // random number will be generated using the previous one.
    int seed = seed_memory[global_id];

    int random_number = rand(&seed); // Generate the next random number in the sequence.

    seed_memory[global_id] = *seed; // Save the seed for the next time this kernel gets enqueued.
}

The code serves just as an example. I have not tested it.
The array "seed_memory" is being filled with rand() only once before the first execution of the kernel. After that, all random number generation is happening on the GPU. I think it's also possible to simply use the kernel id instead of initializing the array with rand().

Tara
  • 1,673
  • 22
  • 30
  • 1
    Thanks for the link, I'd like to point out, that the paper actually proposes a different implementation for GPUs. The one you posted is just the reference from Park&Miller. The paper comes with the source for GPU and sources to an example and test case are linked in the abstract. – Armin Oct 10 '14 at 18:20
  • 1
    Yes, the implementation is a little different. But the results are the same. – Tara Oct 13 '14 at 16:19
4

It seems OpenCL does not provide such functionality. However, some people have done some research on that and provide BSD licensed code for producing good random numbers on GPU.

math
  • 8,514
  • 10
  • 53
  • 61
4

This is my version of OpenCL float pseudorandom noise, using trigonometric function

//noise values in range if 0.0 to 1.0
static float noise3D(float x, float y, float z) {
    float ptr = 0.0f;
    return fract(sin(x*112.9898f + y*179.233f + z*237.212f) * 43758.5453f, &ptr);
}

__kernel void fillRandom(float seed, __global float* buffer, int length) {
    int gi = get_global_id(0);
    float fgi = float(gi)/length;
    buffer[gi] = noise3D(fgi, 0.0f, seed);
}

You can generate 1D or 2D noize by passing to noise3D normalized index coordinates as a first parameters, and the random seed (generated on CPU for example) as a last parameter.

Here are some noise pictures generated with this kernel and different seeds:

noise1 noise2

wilddev
  • 1,904
  • 2
  • 27
  • 45
1

GPU don't have good sources of randomness, but this can be easily overcome by seeding a kernel with a random seed from the host. After that, you just need an algorithm that can work with a massive number of concurrent threads.

This link describes a Mersenne Twister implementation using OpenCL: Parallel Mersenne Twister. You can also find an implementation in the NVIDIA SDK.

Lubo Antonov
  • 2,301
  • 14
  • 18
1

I had the same problem. www.thesalmons.org/john/random123/papers/random123sc11.pdf

You can find the documentation here. http://www.thesalmons.org/john/random123/releases/latest/docs/index.html

You can download the library here: http://www.deshawresearch.com/resources_random123.html

thejinx0r
  • 444
  • 7
  • 10
0

why not? you could just write a kernel that generates random numbers, tough that would need more kernel calls and eventually passing the random numbers as argument to your other kernel which needs them

akaltar
  • 1,002
  • 1
  • 19
  • 25
-2

you cant generate random numbers in kernel , the best option is to generate the random number in host (CPU) and than transfer that to the GPU through buffers and use it in the kernel.

Megharaj
  • 1,589
  • 2
  • 20
  • 32