117

I've seen this pseudo-random number generator for use in shaders referred to here and there around the web:

float rand(vec2 co){
  return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
}

It's variously called "canonical", or "a one-liner I found on the web somewhere".

What's the origin of this function? Are the constant values as arbitrary as they seem or is there some art to their selection? Is there any discussion of the merits of this function?

EDIT: The oldest reference to this function that I've come across is this archive from Feb '08, the original page now being gone from the web. But there's no more discussion of it there than anywhere else.

Grumdrig
  • 16,588
  • 14
  • 58
  • 69
  • It's a noise function, used to create procedurally generated terrain. similar to something like this https://en.wikipedia.org/wiki/Perlin_noise – Shai UI Feb 16 '19 at 22:57
  • 1
    Well, the above function is not similar to Perlin noise. Plus, Perlin noise is *based* on RNG's, since the gradients at the integer positions have to be generated randomly. – Gab Jun 10 '21 at 15:19

5 Answers5

49

Very interesting question!

I am trying to figure this out while typing the answer :) First an easy way to play with it: http://www.wolframalpha.com/input/?i=plot%28+mod%28+sin%28x*12.9898+%2B+y*78.233%29+*+43758.5453%2C1%29x%3D0..2%2C+y%3D0..2%29

Then let's think about what we are trying to do here: For two input coordinates x,y we return a "random number". Now this is not a random number though. It's the same every time we input the same x,y. It's a hash function!

The first thing the function does is to go from 2d to 1d. That is not interesting in itself, but the numbers are chosen so they do not repeat typically. Also we have a floating point addition there. There will be a few more bits from y or x, but the numbers might just be chosen right so it does a mix.

Then we sample a black box sin() function. This will depend a lot on the implementation!

Lastly it amplifies the error in the sin() implementation by multiplying and taking the fraction.

I don't think this is a good hash function in the general case. The sin() is a black box, on the GPU, numerically. It should be possible to construct a much better one by taking almost any hash function and converting it. The hard part is to turn the typical integer operation used in cpu hashing into float (half or 32bit) or fixed point operations, but it should be possible.

Again, the real problem with this as a hash function is that sin() is a black box.

Grumdrig
  • 16,588
  • 14
  • 58
  • 69
starmole
  • 4,974
  • 1
  • 28
  • 48
  • 1
    This doesn't answer the question about the origin, but I don't think it's really answerable. I'll accept this answer because of the illustrative graph. – Grumdrig Oct 30 '14 at 17:23
  • by black box, do you mean we don't know it yields consistent results? like a different gpu or maybe a driver could change the outcome? – toraman Dec 28 '21 at 10:57
  • 1
    @toraman a 'black box' is a process whose internal operations are inscrutable. While every other operation in the equation can be worked out independently, finding the sine value depends on the GPUs implementation, so the results are unpredictable (but will be consistent for any one implementation). – Barney May 09 '22 at 10:46
29

The origin is probably the paper: "On generating random numbers, with help of y= [(a+x)sin(bx)] mod 1", W.J.J. Rey, 22nd European Meeting of Statisticians and the 7th Vilnius Conference on Probability Theory and Mathematical Statistics, August 1998

EDIT: Since I can't find a copy of this paper and the "TestU01" reference may not be clear, here's the scheme as described in TestU01 in pseudo-C:

#define A1 ???
#define A2 ???
#define B1 pi*(sqrt(5.0)-1)/2
#define B2 ???

uint32_t n;   // position in the stream

double next() {
  double t = fract(A1     * sin(B1*n));
  double u = fract((A2+t) * sin(B2*t));
  n++;
  return u;
} 

where the only recommended constant value is the B1.

Notice that this is for a stream. Converting to a 1D hash 'n' becomes the integer grid. So my guess is that someone saw this and converted 't' into a simple function f(x,y). Using the original constants above that would yield:

float hash(vec2 co){
  float t = 12.9898*co.x + 78.233*co.y; 
  return fract((A2+t) * sin(t));  // any B2 is folded into 't' computation
}
MB Reynolds
  • 619
  • 5
  • 9
  • 4
    Very interesting indeed! I found [a paper that references it](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.3911&rep=rep1&type=pdf) as well as [the journal itself on Google Books](https://books.google.com/books?id=QTQk8tXrHKUC&pg=PA323&dq=rey+22nd+European+Meeting+of+Statisticians+and+the+7th+Vilnius+Conference+on+Probability+Theory+and+Mathematical+Statistics,+August+1998) but it appears that the talk or paper itself was not included in the journal. – Grumdrig Dec 12 '15 at 23:33
  • 1
    Also, it would appear from the title that the function I'm asking about should be returning `fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * (co.xy + vec2(43758.5453, SOMENUMBER))` to conform to the function the paper is about. – Grumdrig Dec 12 '15 at 23:36
  • And one more thing, if this is indeed the origin of the use of the function, the question of the origin of the magic numbers (choice of `a` and `b`) used over and over remains, but may have been used in the paper you cite. – Grumdrig Dec 12 '15 at 23:39
  • I can't find the paper anymore either. (edit: same paper as linked above) – MB Reynolds Dec 16 '15 at 13:37
  • Update the answer with more info. – MB Reynolds Dec 18 '15 at 13:57
9

the constant values are arbitrary, especially that they are very large, and a couple of decimals away from prime numbers.

a modulus over 1 of a hi amplitude sinus multiplied by 4000 is a periodic function. it's like a window blind or a corrugated metal made very small because it's multiplied by 4000, and turned at an angle by the dot product.

as the function is 2-D, the dot product has the effect of turning the periodic function at an oblique relative to X and Y axis. By 13/79 ratio approximately. It is inefficient, you can actually achieve the same by doing sinus of (13x + 79y) this will also achieve the same thing I think with less maths..

If you find the period of the function in both X and Y, you can sample it so that it will look like a simple sine wave again.

Here is a picture of it zoomed in graph

I don't know the origin but it is similar to many others, if you used it in graphics at regular intervals it would tend to produce moire patterns and you could see it's eventually goes around again.

bandybabboon
  • 2,210
  • 1
  • 23
  • 33
  • But on GPUs X and Y range from 0..1 and that looks much more random if you change your graph. I know this sounds like a statement but it's actually a question, because my maths education ended at 18 years old. – Strings Sep 21 '13 at 11:15
  • i know, i just zoomed in so that you can see that the random function is of that form except that the ridges are very fast changing, except you have to zoom small to see the changes at all... you can imagine that taking points on the ridges will give pretty random numbers from 0 to 1 height for 1 to 1 x and y values. – bandybabboon Sep 21 '13 at 18:02
  • Oh, I understand, and that seems very logical for any random number generation which at its core uses a sin function – Strings Sep 21 '13 at 19:11
  • 2
    it's a linear zigzag essentially, and the sin is supposed to add a tiny bit of variation, it's the same as if someone was flicking a pack of cards from one to 10 very fast round and round in front of you and you are supposed to try end pick up a pattern of numbers from the cards, they would be random numbers because it would be flicking very fast that he could only get a pattern by choosing cards in an exact synchronisation relative to how fast the cards were spinning round. – bandybabboon Sep 22 '13 at 03:45
  • Just a note, it wouldn't be faster to do `(13x + 79y)` since `dot(XY, AB)` will do exactly what you describe, as its the dot product, which `x,y dot 13, 79 = (13x + 79y)` – Krupip Feb 06 '18 at 22:26
3

I do not believe this to be the true origin, but OP's code is presented as code example in "The Book of Shaders" by Patricio Gonzalez Vivo and Jen Lowe ( https://thebookofshaders.com/10/ ). In their code, Patricio Gonzales Vivo is cited as the author, i.e "// Author @patriciogv - 2015"

Since the OP's research dates back even further (to '08), the source might at least explain its popularity, and the author might be able to shed some light on his source.

Ineluki
  • 31
  • 2
2

Maybe it's some non-recurrent chaotic mapping, then it could explain many things, but also can be just some arbitrary manipulation with large numbers.

EDIT: Basically, the function fract(sin(x) * 43758.5453) is a simple hash-like function, the sin(x) provides smooth sin interpolation between -1 to 1, so sin(x) * 43758.5453 will be interpolation from -43758.5453 to 43758.5453. This is a quite huge range, so even small step in x will provide large step in result and really large variation in fractional part. The "fract" is needed to get values in range -0.99... to 0.999... . Now, when we have something like hash function we should create function for production hash from the vector. The simplest way is call "hash" separetly for x any y component of the input vector. But then, we will have some symmetrical values. So, we should get some value from the vector, the approach is find some random vector and find "dot" product to that vector, here we go: fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453); Also, according to the selected vector, its lenght should be long engough to have several peroids of the "sin" function after "dot" product will be computed.

Roman
  • 350
  • 2
  • 8
  • but then 4e5 should work as well, I don't understand why the magic number 43758.5453. (also, I would offset x by some fractional number to avoid rand(0)=0. – Fabrice NEYRET Apr 29 '16 at 08:38
  • 1
    I think with 4e5 you will not get so much variation of the fractional bits, it will always give you the same value. So, two conditions have to be met, large enough and have enough good variation of the fractional parts. – Roman May 02 '16 at 13:14
  • what do you mean, "will always give you the same value" ? (if you mean it will always take the same digits, first, they are still chaotic, second, float are stored as m*2^p, not 10^p, so *4e5 still scrambles bits). – Fabrice NEYRET May 07 '16 at 09:32
  • I thought you wrote an exponential representation of the number, 4 * 10^5, so sin(x)*4e5 will give you not so chaotic number. I agree that fractional bits from sin wave will give you good chatoic as well. – Roman May 12 '16 at 09:05
  • But, then it depends on the range of x, I mean if function should be robust for small (-0.001, 0.001) and large values (-1, 1). You can try to see difference with fract(sin(x /1000.0) * 43758.5453); and fract(sin(x /1000.0) * 4e5);, where x in range [-1., 1.]. In the second variant image will be more monotonic (at least I see the difference in the shader). But, in general, I agree that you still can use 4e5 and have good enough result. – Roman May 12 '16 at 09:06