845

I can't get my head around this, which is more random?

rand()

OR:

rand() * rand()

I´m finding it a real brain teaser, could you help me out?


EDIT:

Intuitively I know that the mathematical answer will be that they are equally random, but I can't help but think that if you "run the random number algorithm" twice when you multiply the two together you'll create something more random than just doing it once.

Arsen Khachaturyan
  • 7,904
  • 4
  • 42
  • 42
Trufa
  • 39,971
  • 43
  • 126
  • 190
  • 164
    What do you mean by "more random"? – dan04 Oct 18 '10 at 03:43
  • 4
    @dan04- I'm pretty sure he means a lower chance of collision, which is my definition of 'random'. +1 to this question by the way. I'm guessing the second one is more random, but I really don't know... – Dominic K Oct 18 '10 at 03:44
  • @dan04 DMan couldn´t have said it better! – Trufa Oct 18 '10 at 03:46
  • 6
    I tried thinking about this and had a Stack Overflow of my own... – Michael Goldshteyn Oct 18 '10 at 03:48
  • @Trufa I know what you mean, but unfortunately the cold, hard math just doesn't work that way. :( – Matthew Scharley Oct 18 '10 at 03:55
  • @Michael Goldshteyn Wait till you read Matthew Scharley´s answer :) – Trufa Oct 18 '10 at 03:55
  • 56
    As others have stated, these two quantities do not have the same distribution. See http://mathworld.wolfram.com/UniformProductDistribution.html for the distribution you're actually getting. Compare this to a single uniform random number, where all values in the interval are equally likely, so the probability density function is a horizontal straight line. – bnaul Oct 18 '10 at 03:57
  • 45
    I strongly recommend reading [Random Stupidity](http://thedailywtf.com/Articles/Random-Stupidity.aspx) on [the Daily WTF](http://thedailywtf.com/). Especially read [this comment](http://thedailywtf.com/Comments/Random-Stupidity.aspx?pg=2#182537), where they analyse the output of this new random number. The message to take away from that is: *arbitrary operations on random numbers do not necessarily result in random output*. – detly Oct 18 '10 at 03:57
  • 51
    Also: *Intuitively I know that the mathematical answer will be that they are equally random* - if you could do maths by intuition alone, we wouldn't need all of those bloody symbols :P – detly Oct 18 '10 at 03:58
  • 1
    you mean ideal rand() resulting in real numbers or real rand() resulting in truncated representations? – mykhal Oct 18 '10 at 04:00
  • @mykhal sorry you lost me a little bit there... – Trufa Oct 18 '10 at 04:28
  • 94
    Don't take Statistics and Intuition to the same party .... – Dr. belisarius Oct 18 '10 at 04:44
  • 2
    They really dont get along do they? – Trufa Oct 18 '10 at 04:47
  • @Trufa Nope. As soon as you start talking about Distributions you should use your intuition with utmost care – Dr. belisarius Oct 18 '10 at 04:53
  • 9
    To simulate fair dice roll, you should use the standard random number 4 from RFC 1149.5. – AndiDog Oct 18 '10 at 06:01
  • 1
    @Trufa From Wikipedia (http://en.wikipedia.org/wiki/Randomness): The central idea is that a string of bits is random if and only if it is shorter than any computer program that can produce that string (Kolmogorov randomness)—this means that random strings are those that cannot be compressed. – Dr. belisarius Oct 18 '10 at 06:26
  • 1
    @AndiDog +1 for epicness – Justus Romijn Oct 18 '10 at 13:01
  • @Trufa: Note that most major languages have a built in (usually weak) random number generator and various 3rd party, better random number generators. If your random number generated is not "random" enough, either don't use a real random generator (usually not ideal), or use a better random number generator. – Brian Oct 18 '10 at 14:30
  • @Belisarius: That is one possible definition, but I would rather use that to describe entropy. – Brian Oct 18 '10 at 14:34
  • 1
    @belisarius dicho sea de paso, muy buena respuesta la verdad que después de un rato me dieron un muy buen panorama! – Trufa Oct 18 '10 at 17:46
  • @Trufa Created a chat room for your question http://chat.stackoverflow.com/rooms/108/understanding-randomness – Dr. belisarius Oct 18 '10 at 17:53
  • It's akin to infinity infinities, the nature denies the operation. The program will run, but it will be nonsensical – Mild Fuzz Oct 18 '10 at 19:28
  • 1
    Trufa, by your logic wouldn't: rand()^rand() be even more random? – marcamillion Oct 18 '10 at 20:24
  • 5
    Here is a naive definition for "More random": for some, random means "hard to guess", as in, guess the value of the card at the top of the deck. By shuffling the deck, it would seem that the random value is "Even harder to guess", and from this practical, intuitive understanding of randomness, it would make sense to 'shuffle' the deck in the program, by some means. Of course, that's not what 'randomness' means, and the science of introducing entropy to a pseudorandom process is not nearly as easy as shuffling the process with it's own output. – SingleNegationElimination Oct 18 '10 at 20:31
  • Thanks @Yi Jiang and @Sam Saffron for the edits, I´m typo machine :) – Trufa Oct 18 '10 at 22:44
  • @Mild Fuzz: Nature denies infinite infinities? Aren't those infinitesimals? Don't fractals exist in nature? Or am I completely misreading your statement due to my own stupidity? – Grant Thomas Oct 19 '10 at 12:11
  • @Trufa Two more favs for your Stellar Badge! – Dr. belisarius Oct 19 '10 at 12:54
  • 1
    @belisarius I hate to admit this, but I´m refreshing my screen every ten seconds waiting for them :) – Trufa Oct 19 '10 at 13:08
  • @Mr. Disappointment maybe I am being dumb, but the nature of infinity is that it is boundless, so squaring it would achieve nothing. – Mild Fuzz Oct 19 '10 at 14:00
  • 1
    @Trufa It is amazing how such a simple question has generated so much interest! – Liam Oct 19 '10 at 15:02
  • 1
    @Liam Absolutely, I knew there wasn´t simple answer to this simple question, but never this. I think it is fairly obvious I hadn´t seen seen full complexity of this question. I think the response speaks very good about this community and its hunger for knowledge! Happy to be a humble part it! :) – Trufa Oct 19 '10 at 15:19
  • - (int) GetRandomNumber { return 4; /* TODO: Test this */ } – ingh.am Oct 20 '10 at 08:55

28 Answers28

1492

Just a clarification

Although the previous answers are right whenever you try to spot the randomness of a pseudo-random variable or its multiplication, you should be aware that while Random() is usually uniformly distributed, Random() * Random() is not.

Example

This is a uniform random distribution sample simulated through a pseudo-random variable:

Histogram of Random()

        BarChart[BinCounts[RandomReal[{0, 1}, 50000], 0.01]]

While this is the distribution you get after multiplying two random variables:

Histogram of Random() * Random()

        BarChart[BinCounts[Table[RandomReal[{0, 1}, 50000] * 
                                 RandomReal[{0, 1}, 50000], {50000}], 0.01]]

So, both are “random”, but their distribution is very different.

Another example

While 2 * Random() is uniformly distributed:

Histogram of 2 * Random()

        BarChart[BinCounts[2 * RandomReal[{0, 1}, 50000], 0.01]]

Random() + Random() is not!

Histogram of Random() + Random()

        BarChart[BinCounts[Table[RandomReal[{0, 1}, 50000] + 
                                 RandomReal[{0, 1}, 50000], {50000}], 0.01]]

The Central Limit Theorem

The Central Limit Theorem states that the sum of Random() tends to a normal distribution as terms increase.

With just four terms you get:

Histogram of Random() + Random() + Random() + Random()

BarChart[BinCounts[Table[RandomReal[{0, 1}, 50000] + RandomReal[{0, 1}, 50000] +
                   Table[RandomReal[{0, 1}, 50000] + RandomReal[{0, 1}, 50000],
                   {50000}],
         0.01]]  

And here you can see the road from a uniform to a normal distribution by adding up 1, 2, 4, 6, 10 and 20 uniformly distributed random variables:

Histogram of different numbers of random variables added

Edit

A few credits

Thanks to Thomas Ahle for pointing out in the comments that the probability distributions shown in the last two images are known as the Irwin-Hall distribution

Thanks to Heike for her wonderful torn[] function

Community
  • 1
  • 1
Dr. belisarius
  • 60,527
  • 15
  • 115
  • 190
  • 42
    +1. Since the OP probably wanted uniform distribution, this should be the accepted answer. And if you did `rand()+rand()`, you would end up with a "2d6"-type distribution with a fat center. – Thilo Oct 18 '10 at 04:09
  • @Thilo I was just on that when you posted your comment :) – Dr. belisarius Oct 18 '10 at 04:11
  • @belisarius, I´m starting to grasp it now. imgur.com is coming and going... OHh and thank you very much! – Trufa Oct 18 '10 at 04:31
  • 8
    This is very interesting, but it kills me on the inside how anti-intuitive that is. I will give a more thorough look after I read a little more about distribution. Thank you very much! – Trufa Oct 18 '10 at 04:57
  • 47
    @Trufa: Maybe this will help with part of the intuition, at least for sums. Imagine taking the "average" of one rolled die. Now imagine taking the average of two dice. Now one hundred. What happens to the chance of getting a one or a six for the average as you add more dice? – johncip Oct 18 '10 at 07:26
  • How would the graph look like in C, with the returned type being an integer, and multiplication potentially causing an overflow? Also, how would the infamous C stdlib rand implementation affect the outcome when using 2 consecutive results? – geon Oct 18 '10 at 15:21
  • 2
    In some implementations, (rand()+rand() % (RAND_MAX+1)), may yield better numbers than a single random() call; in others it will be worse. The distribution of (rand()*rand() % (RAND_MAX+1)) is more "interesting". If (RAND_MAX+1) is prime, the distribution will be uniform except for a peak at zero (every value can be achieved RAND_MAX ways except zero, which can be achieved 2*RAND_MAX+1 ways). If (RAND_MAX+1) is non-prime, its factors will get extra hits. – supercat Oct 18 '10 at 16:21
  • 1
    @belisarius, how did you make these charts? – matt b Oct 19 '10 at 01:06
  • 3
    @matt b The Charts are one-liners in Mathematica. The code is the text in bold that precedes each graph. Mathematica is an awesome language for doing Plots! – Dr. belisarius Oct 19 '10 at 01:17
  • 1
    @belisarius: 550+ votes in 24 hours, this has to be a new record on SO! – Matthieu M. Oct 19 '10 at 13:01
  • 1
    Histograms: They kick the a**es of people claiming there's no such thing as "more random" –  Oct 19 '10 at 15:33
  • 3
    @thenonhacker: The histograms actually say nothing about the randomness of the number. A non-uniform distribution is not necessarily non-random (discounting quantization errors). For any distribution f(x) with a parametrically defined shape, there exists an equalization function h(y) such that h(f(x)) = c. – Kennet Belenky Oct 19 '10 at 20:54
  • OMG, E = m * sqr(c)!!! Kidding aside, the histograms help measure the bias of the random numbers generated, and a Uniformly-Distributed Histogram is something that you want for a Lottery, not the histograms with peaks and biases. That's the user was asking about here, and that's why he rewarded the answer above with the green checkmark. –  Oct 19 '10 at 21:00
  • 5
    @thenonhacker: yes, the histograms do demonstrate bias, but they don't demonstrate non-randomness. Biased random numbers are not less random. As for the correct answer to the user's original question is, "don't try to be clever, you'll just make things worse," and this answer does get that point across. – Kennet Belenky Oct 19 '10 at 21:17
  • 1
    In case anybody wonders, the distribution of the sums is the Irwin Hall distribution: http://en.wikipedia.org/wiki/Irwin%E2%80%93Hall_distribution For n `rand()`s, it is a composition of n polynomials. – Thomas Ahle Apr 14 '11 at 14:54
  • @belisaries I also just found out recently. I wonder if the distribution of the product of random variables has a name. Perhaps not, since it is simply `(-logx)^(n-1)/(n-1)!`, `n` being the number of `rand()`'s. – Thomas Ahle Apr 14 '11 at 23:34
  • 1
    @belisarius I agree with Konrad's edits. Having the text and code written separately is a big improvement in terms of accessibility and search engine optimization over having that textual information embedded in an image. His edit appears to be verbatim, so if you feel that those sections *must* be bold, please improve upon his edit instead of rolling back to an inferior revision. – Bill the Lizard Jun 09 '12 at 22:40
  • randomness is nothing, if we cant predict it we call it random. But sooner or later we will – Hackaholic Nov 20 '14 at 10:42
  • I assume the x axis in those graphs has to do with amount of times you generated the result but what is the y axis? – Lealo Aug 15 '17 at 02:51
152

I guess both methods are as random although my gutfeel would say that rand() * rand() is less random because it would seed more zeroes. As soon as one rand() is 0, the total becomes 0

Salman A
  • 262,204
  • 82
  • 430
  • 521
Janco
  • 1,130
  • 1
  • 8
  • 15
  • 18
    My answer to all answers using this strip is this: I like humour, but it **must** be CW! – Andreas Rejbrand Oct 18 '10 at 20:54
  • 3
    @Andreas Rejbrand: That's like saying, I like humour, but I don't want to pay for it – Andomar Oct 19 '10 at 14:28
  • 4
    @Andomar: No, it isn't. Not at all. Do you know what CW is? – Andreas Rejbrand Oct 19 '10 at 14:50
  • 17
    @Andreas Rejbrand: CW is a weapon that kills interesting questions by denying reputation to those that answer it. Looks like it got nerfed http://meta.stackexchange.com/questions/392/should-the-community-wiki-police-be-shut-down (which is perhaps why this interesting question pops up!) – Andomar Oct 19 '10 at 15:04
  • 11
    @Andomar - Yes, CW kills interesting questions, but (from the [FAQ](http://stackoverflow.com/faq)) "Reputation is a rough measurement of how much the community trusts you." If you include a funny, [copyrighted](http://www.dilbert.com/terms/) image in your answer, it will make me think your answer is cool, and I will probably think _you_ are cool too, but it doesn't make you more trust-worthy - hence, ideally, no rep should be awarded. Whether that means CW, or whether it means one shouldn't vote the answer up is another issue. – Richard JP Le Guen Oct 19 '10 at 17:32
  • 13
    the "random generator" troll in the cartoon might be just a savant reciting π, and just reaching the [Feynman point](http://en.wikipedia.org/wiki/Feynman_point). btw, **are π digits random?** :) – mykhal Oct 20 '10 at 03:24
  • @mykhal Always thought the "999999" was that. But your main question remains open (AFAIK). – Dr. belisarius Oct 20 '10 at 11:17
  • 1
    @mykhai: A less controversial formulation would be "Are the digits of pi normal?" (it *looks* that way) – Piskvor left the building Oct 30 '10 at 09:30
  • @mykhal I think I read a while back that out of all known pi digits "1" appears twice as often as the other numbers or something like that haha. – Albert Renshaw Mar 28 '13 at 03:27
  • use random() Xor random() – EKanadily Feb 28 '14 at 00:32
82

Neither is 'more random'.

rand() generates a predictable set of numbers based on a psuedo-random seed (usually based on the current time, which is always changing). Multiplying two consecutive numbers in the sequence generates a different, but equally predictable, sequence of numbers.

Addressing whether this will reduce collisions, the answer is no. It will actually increase collisions due to the effect of multiplying two numbers where 0 < n < 1. The result will be a smaller fraction, causing a bias in the result towards the lower end of the spectrum.

Some further explanations. In the following, 'unpredictable' and 'random' refer to the ability of someone to guess what the next number will be based on previous numbers, ie. an oracle.

Given seed x which generates the following list of values:

0.3, 0.6, 0.2, 0.4, 0.8, 0.1, 0.7, 0.3, ...

rand() will generate the above list, and rand() * rand() will generate:

0.18, 0.08, 0.08, 0.21, ...

Both methods will always produce the same list of numbers for the same seed, and hence are equally predictable by an oracle. But if you look at the the results for multiplying the two calls, you'll see they are all under 0.3 despite a decent distribution in the original sequence. The numbers are biased because of the effect of multiplying two fractions. The resulting number is always smaller, therefore much more likely to be a collision despite still being just as unpredictable.

Matthew Scharley
  • 127,823
  • 52
  • 194
  • 222
  • 9
    +1 Note that on the other hand `rand()+rand()+rand()...` gets increasingly "less random" (if by random you mean uniformly distributed). – Thilo Oct 18 '10 at 03:48
  • 4
    @Thilo No, it doesn't... ? If a random variable is uniformly distributed in the range (0,1), and you sample the variable n times, and take the sum, it will just be uniformly distributed in the range (0,n). – user359996 Oct 18 '10 at 03:56
  • 1
    Ok matthew you seem to know what you are talking abou but guide me through this, so, you say it is equally predictable but with more chance of collision? I dont get it! Thank you!!! – Trufa Oct 18 '10 at 03:58
  • +1 Great answer. One question though- is it likely that each rand() is based on the same seed, or a different seed? I know it depends on the time, but is it calculated so fast that it uses the same seed value? – Dominic K Oct 18 '10 at 03:59
  • @Trufa See detly's link: http://thedailywtf.com/Comments/Random-Stupidity.aspx?pg=2#182537 – user359996 Oct 18 '10 at 04:00
  • 1
    @Dman: Each call to `rand()` uses a persistent seed. You can set the seed via `srand()`. In fact, you *should* only set the seed once, otherwise you will drastically alter the randomness of your results (likely resulting in many duplicates from getting the first number in the sequence with identical seeds). – Matthew Scharley Oct 18 '10 at 04:13
  • 1
    @Trufa see my expanded worded explanation, and have a look at belisarius's lovely graphs for a more graphical explanation. – Matthew Scharley Oct 18 '10 at 04:14
  • @Matthew I think I´m starting to get this. Great edit, really!! I am with a pencil and paper know going through this :) – Trufa Oct 18 '10 at 04:20
  • So matthew, one more thing (this might be a whole other question) but when you want more randomness, whats the best way to go, "more or better" seed or better seeds or better algorithms to generate? – Trufa Oct 18 '10 at 05:02
  • 5
    @Trufa just trust `rand()` to actually be random, and don't try to 'enhance' it's randomness. Don't set the seed multiple times. Any individual seed is perfectly fine, as long as it's semi-random itself. Lots of implementations I've seen use the UNIX epoch as the seed, which changes every second and is unique every time it changes. – Matthew Scharley Oct 18 '10 at 05:34
  • 1
    @Trufa: You ask if it is equally predictable, but with more chance of collision... I think graph 2 from belisarius answer explains this. rand * rand will give many more results that are close to each other (many occurences between 0.1 and 0.2, a little fewer between 0.2 and 0.3 etc...), so if you round the results to 1 decimal, you will have more occurences of 0.1 than 0.2, and more of 0.2 than 0.3 etc. (although with more decimals they are unique). – awe Oct 18 '10 at 08:30
  • 61
    @user359996 rand()+rand() is not uniformly distributed. Add two dice, you are more likely to get 7 than 2. – Liam Oct 18 '10 at 11:46
  • 1
    Agreed with Liam. It's "more random", when values are more distributed evenly within a range. e.g. All values within a range have a similar chance of getting chosen as the random value. –  Oct 19 '10 at 15:26
  • 4
    @thenonhacker See my definition of randomness in my post. Just because values tend towards one end of the spectrum doesn't increase the predictability of the exact values produced which is what I was referring to when I used the word random. I then went on to address the issue of the bias seperately. – Matthew Scharley Oct 19 '10 at 20:26
  • 2
    @Muhammad: Math is all about definitions, and I have every right to define anything I like the way I like, as long as I give sound reasoning why. There are many definitions for randomness (or perhaps, many partial definitions), including the one I gave. Also, as counter intuitive as it sounds bias doesn't increase predictability. Just because I get 0.1's more often from a particular function doesn't mean that you can predict what the next number to come out of the function will be. Prediction in the sense of an oracle is entirely different to probability. – Matthew Scharley Oct 23 '10 at 06:56
  • @Matthew: I believe I know about math. You can't redefine probability or the Gaussian distribution for example. I would love to see your definition [citation needed]. If I know that I am more likely to get 0.1 then I can predict 0.1 as the next number with high probability; that is the whole assumption behind machine learning for example. I gave a specific example with card counting: you don't know the next value but you can guess it with higher probability. In the case of the question the variance of rand()*rand() is clearly lower than that of rand(). Is the variance irrelevant as well? – Muhammad Alkarouri Oct 23 '10 at 07:27
  • @Alderath: Bias will surely increase the probability that you can guess any individual number, but the idea of an oracle which can guess every number is unaffected by bias, and I was trying to get at the idea of randomness vs an oracle with this answer as the two concepts are different, but related, and both are useful in different applications. – Matthew Scharley Mar 19 '12 at 22:27
  • @MatthewScharley Bias can surely increase predictability. Consider a binary random generator. I.e. `rand()` returns either 1 or 0 with equal probability. Assume that we bias the outcome by doing `rand() * rand()`. The probability of getting 0 is 75%. Now assume we multiply 100 different numbers from the random generator. The probability of recieving 0 is (1 - 1/2^100) which is essentially 1. Hence, with a quite large certainty we can predict that the outcome will be 0. Predictions often involve uncertainties. In maths, if something is certain, it is a logical implication, not a prediction. – Alderath Mar 19 '12 at 22:31
  • Sorry. I modified my previous post as I accidently posted it before being done writing. – Alderath Mar 19 '12 at 22:32
  • Probably no one is listening, but for the sake of putting this argument to bed, the people actually working on implementing random number generators can't entirely agree on what random is in a world that is inherently not random and very precise. There are entire suites of tests designed to test different aspects of potential algorithms. To that end, there's two major distinctions. "Randomness" or "distribution", a property most important to PRNG's that only need to live up to human inspection. In the world of CSPRNG's though oracles are much more dangerous. – Matthew Scharley Aug 08 '16 at 16:41
  • Specifically, if there's bias in the algorithm and you can guess what the next number is 5% of the time that'd be bad. Far worse though is breaking the algorithm itself so that you can guess with **certainty** every single subsequent number - i.e. an oracle. Both concepts are important and both concepts count towards what "randomness" actually is. – Matthew Scharley Aug 08 '16 at 16:41
  • Not once have I said that either part could be ignored - but the other answers very much focus on the first part and I just wanted to expand on the rest of the story with my answer. – Matthew Scharley Aug 08 '16 at 16:43
80

Oversimplification to illustrate a point.

Assume your random function only outputs 0 or 1.

random() is one of (0,1), but random()*random() is one of (0,0,0,1)

You can clearly see that the chances to get a 0 in the second case are in no way equal to those to get a 1.


When I first posted this answer I wanted to keep it as short as possible so that a person reading it will understand from a glance the difference between random() and random()*random(), but I can't keep myself from answering the original ad litteram question:

Which is more random?

Being that random(), random()*random(), random()+random(), (random()+1)/2 or any other combination that doesn't lead to a fixed result have the same source of entropy (or the same initial state in the case of pseudorandom generators), the answer would be that they are equally random (The difference is in their distribution). A perfect example we can look at is the game of Craps. The number you get would be random(1,6)+random(1,6) and we all know that getting 7 has the highest chance, but that doesn't mean the outcome of rolling two dice is more or less random than the outcome of rolling one.

Alin Purcaru
  • 43,655
  • 12
  • 77
  • 90
  • +1 for condensing something devilishly tricky into "equally random over different distributions". Very elegant. – Jens Roland Mar 01 '11 at 23:10
  • 3
    So technically, (random()*0+9) is equally random, since it randomly returns a value from the 1-element set: [9]. The Dilbert cartoon was right. – Jens Roland Mar 01 '11 at 23:14
  • 2
    @Jens Rolan "any other combination that doesn't lead to a fixed result" ;). 999999 probably isn't randomly generated and the chance that it was randomly generated can be computed. – Alin Purcaru Mar 02 '11 at 08:35
69

Here's a simple answer. Consider Monopoly. You roll two six sided dice (or 2d6 for those of you who prefer gaming notation) and take their sum. The most common result is 7 because there are 6 possible ways you can roll a 7 (1,6 2,5 3,4 4,3 5,2 and 6,1). Whereas a 2 can only be rolled on 1,1. It's easy to see that rolling 2d6 is different than rolling 1d12, even if the range is the same (ignoring that you can get a 1 on a 1d12, the point remains the same). Multiplying your results instead of adding them is going to skew them in a similar fashion, with most of your results coming up in the middle of the range. If you're trying to reduce outliers, this is a good method, but it won't help making an even distribution.

(And oddly enough it will increase low rolls as well. Assuming your randomness starts at 0, you'll see a spike at 0 because it will turn whatever the other roll is into a 0. Consider two random numbers between 0 and 1 (inclusive) and multiplying. If either result is a 0, the whole thing becomes a 0 no matter the other result. The only way to get a 1 out of it is for both rolls to be a 1. In practice this probably wouldn't matter but it makes for a weird graph.)

Liam
  • 19,819
  • 24
  • 83
  • 123
valadil
  • 1,648
  • 1
  • 14
  • 30
  • 4
    "Multiplying your results instead of adding them is going to skew them in a similar fashion, with most of your results coming up in the middle of the range." - check this assertion against the second graph in the answer from belisarius. – Daniel Earwicker Oct 19 '10 at 10:43
51

The obligatory xkcd ...
return 4; // chosen by fair dice roll, guaranteed to be random.

Community
  • 1
  • 1
crowne
  • 8,456
  • 3
  • 35
  • 50
  • 7
    danmn this always ends up appearing when the word "random appears" :) I was waiting for it!! – Trufa Oct 18 '10 at 20:38
  • 9
    I like humour, but it **must** be CW. – Andreas Rejbrand Oct 18 '10 at 20:54
  • 2
    @Andreas Rejbrand - why should this "humor" answer be CW? – warren Oct 20 '10 at 13:05
  • 16
    If it is not CW, reputation will be awared the poster of the answer every time it is up-voted (160 rep so far). Now, reputation is like grades in school -- it should be a certificate of technical (in this case, programming) proficiancy. Therefore, one should not be able to gain reputation by posting something that is easily upvoted but that needs no such proficiancy. Furthermore, the reputation score also determines the privileges of the user. For instance, at 10 000 score, the user gets access to moderation tools at StackOverflow. – Andreas Rejbrand Oct 20 '10 at 13:09
35

It might help to think of this in more discrete numbers. Consider want to generate random numbers between 1 and 36, so you decide the easiest way is throwing two fair, 6-sided dice. You get this:

     1    2    3    4    5    6
  -----------------------------
1|   1    2    3    4    5    6
2|   2    4    6    8   10   12
3|   3    6    9   12   15   18
4|   4    8   12   16   20   24   
5|   5   10   15   20   25   30
6|   6   12   18   24   30   36

So we have 36 numbers, but not all of them are fairly represented, and some don't occur at all. Numbers near the center diagonal (bottom-left corner to top-right corner) will occur with the highest frequency.

The same principles which describe the unfair distribution between dice apply equally to floating point numbers between 0.0 and 1.0.

Juliet
  • 80,494
  • 45
  • 196
  • 228
  • 3
    +1 for showing more concretely, the change in distribution when multiplying the random numbers. The matrix helped more than just the words or even a distribution graph. – Marjan Venema Oct 19 '10 at 06:05
26

Some things about "randomness" are counter-intuitive.

Assuming flat distribution of rand(), the following will get you non-flat distributions:

  • high bias: sqrt(rand(range^2))
  • bias peaking in the middle: (rand(range) + rand(range))/2
  • low:bias: range - sqrt(rand(range^2))

There are lots of other ways to create specific bias curves. I did a quick test of rand() * rand() and it gets you a very non-linear distribution.

staticsan
  • 29,935
  • 4
  • 60
  • 73
24

Most rand() implementations have some period. I.e. after some enormous number of calls the sequence repeats. The sequence of outputs of rand() * rand() repeats in half the time, so it is "less random" in that sense.

Also, without careful construction, performing arithmetic on random values tends to cause less randomness. A poster above cited "rand() + rand() + rand() ..." (k times, say) which will in fact tend to k times the mean value of the range of values rand() returns. (It's a random walk with steps symmetric about that mean.)

Assume for concreteness that your rand() function returns a uniformly distributed random real number in the range [0,1). (Yes, this example allows infinite precision. This won't change the outcome.) You didn't pick a particular language and different languages may do different things, but the following analysis holds with modifications for any non-perverse implementation of rand(). The product rand() * rand() is also in the range [0,1) but is no longer uniformly distributed. In fact, the product is as likely to be in the interval [0,1/4) as in the interval [1/4,1). More multiplication will skew the result even further toward zero. This makes the outcome more predictable. In broad strokes, more predictable == less random.

Pretty much any sequence of operations on uniformly random input will be nonuniformly random, leading to increased predictability. With care, one can overcome this property, but then it would have been easier to generate a uniformly distributed random number in the range you actually wanted rather than wasting time with arithmetic.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
Eric Towers
  • 4,175
  • 1
  • 15
  • 17
  • I had that thought too, that it would be going through the random generator period twice as fast. – Jared Updike Oct 18 '10 at 23:34
  • 3
    The sequence length will only be cut in half if it is even. If it's odd, you get r1*r2, r3*r4, ..., rn*r1, r2*r3, r4*r5, and the total length is the same. – Jander Oct 20 '10 at 07:04
23

"random" vs. "more random" is a little bit like asking which Zero is more zero'y.

In this case, rand is a PRNG, so not totally random. (in fact, quite predictable if the seed is known). Multiplying it by another value makes it no more or less random.

A true crypto-type RNG will actually be random. And running values through any sort of function cannot add more entropy to it, and may very likely remove entropy, making it no more random.

abelenky
  • 63,815
  • 23
  • 109
  • 159
  • 3
    Note, this isn't squaring since each call with return a different value. Everything else is accurate though. – Matthew Scharley Oct 18 '10 at 03:45
  • Ok I will have to google crypto-type RNG but thanks for the answer! :) – Trufa Oct 18 '10 at 03:49
  • "running values through any sort of function cannot add more entropy to it" - What about a text compression algorithm? Isn't that something designed to increase the entropy of the resulting string? – CurtainDog Oct 19 '10 at 08:30
  • Again: It's "more random", when values are more distributed evenly within a range. e.g. All values within a range get their fair chance of getting chosen as the random value. –  Oct 19 '10 at 15:28
  • @thenonhacker: You seem to be suggesting that sequentially cycling through a set of numbers (e.g. 1-10) is random. After all, each number comes up exactly 1 out of 10 times, which seems exceedingly evenly distributed and fair. But it is definitely not random. I don't know an "official" definition of random, but I believe when each bit has a 50/50 chance, unrelated to any other bit, and fully unpredictable, of being 1 or 0, then the resulting value will be random. – abelenky Oct 19 '10 at 17:12
  • @abelenky: Oh yes, it is. See the topmost answer to see my point. When all numbers get their fair chance of being displayed, there is no peaking and biasing on a large set of samples. I said large set of samples: Like you have to flip a coin 300 times, head side should roughly have a fair chance of appearing with tail side. If there is a bias towards the head than tail, then that is not random, and I can abuse that bias because now I know that heads has higher probability of appearing than tails. –  Oct 19 '10 at 19:06
  • 2
    @thenonhacker: By your own description, the sequence "1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10..." is random. It is evenly distributed, with all numbers getting a fair chance. There is no peaking or biasing. Do you really consider that sequence random??? You need to change your definition. Random is not about the output, random is about the *process* used to create the output. – abelenky Oct 19 '10 at 19:24
  • @abelnky: Yes, if you get lucky, one of the Random Seeders can get you "1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10". So in Poker, you got Pair Aces in two consecutive times even with a very shuffled sequence of cards, that's luck. You can argue also with "10,9,8,7,6,5,4,3,2,1,10,9,8,7,6,5,4,3,2,1,10,9,8,7,6,5,4,3,2,1", another lucky random sequence. –  Oct 19 '10 at 19:42
  • @abelnky: Here's exactly what I'm talking about: http://stackoverflow.com/questions/3956478/understanding-randomness/3956538#3956538 –  Oct 19 '10 at 19:50
  • 2
    @CurtainDog: Text-compression keeps the level of entropy the same while reducing the number of bits required to express that same amount of entropy. – Kennet Belenky Oct 19 '10 at 20:41
  • 4
    @thenonhacker, @abelenky: Even distributions are easy. What matters in a random number generator is the number of bits in the state of the random number generator. A zero-state random number generator (e.g. 4, 4, 4, 4, 4, ...) is completely predictable. A one-time-pad has as much state as the number of values it produces, thus making it impossible to predict. A convolution of two PNRGs will produce a PNRG with as many bits of entropy as they both contain, minus their covariance. – Kennet Belenky Oct 19 '10 at 20:47
  • 1
    @Kennet - Thanks, you've hugely cleared that up for me. @abelenky - cool, i get you now. – CurtainDog Oct 19 '10 at 23:43
20

The concept you're looking for is "entropy," the "degree" of disorder of a string of bits. The idea is easiest to understand in terms of the concept of "maximum entropy".

An approximate definition of a string of bits with maximum entropy is that it cannot be expressed exactly in terms of a shorter string of bits (ie. using some algorithm to expand the smaller string back to the original string).

The relevance of maximum entropy to randomness stems from the fact that if you pick a number "at random", you will almost certainly pick a number whose bit string is close to having maximum entropy, that is, it can't be compressed. This is our best understanding of what characterizes a "random" number.

So, if you want to make a random number out of two random samples which is "twice" as random, you'd concatenate the two bit strings together. Practically, you'd just stuff the samples into the high and low halves of a double length word.

On a more practical note, if you find yourself saddled with a crappy rand(), it can sometimes help to xor a couple of samples together --- although, if its truly broken even that procedure won't help.

  • 2
    I had never thought about random number generations via xor, but I guess you can take the concept pretty far (http://en.wikipedia.org/wiki/Mersenne_twister)! Thanks for the answer. – Gabriel Mitchell Oct 19 '10 at 05:29
  • 1
    I'm really struggling to grok this answer... Isn't maximum entropy defeated by the answers given in http://stackoverflow.com/questions/3956478/understanding-randomness/3963165#3963165 and http://stackoverflow.com/questions/3956478/understanding-randomness/3963140#3963140. In these cases the number picked can't be compressed but you'd be hard pressed to call them random. – CurtainDog Oct 19 '10 at 08:28
  • 1
    +1 Beautiful as the accepted answer is, this is my favourite. When it comes to computers, always think in bits - much less confusing and more relevant than trying to think in terms of reals. (I wrote my answer and then noticed this one, so mine is nothing more than an expansion of this one - maybe with some entropy added). – Daniel Earwicker Oct 19 '10 at 12:06
  • 1
    @CurtainDog xkcd's random number `4` or binary `0100` can be compressed to zero bits. The decompression program would simply return '4'. It doesn't get less random than that. The problem with dilbert is, we do not know if we can compress it to zero bits (decompressing by always returning 'nine'). It might return eight aswell, then we could compress to 1 bit. Decompressing by: 0->nine, 1->eight. We would have 1 random bit. – Ishtar Oct 19 '10 at 12:10
14

The accepted answer is quite lovely, but there's another way to answer your question. PachydermPuncher's answer already takes this alternative approach, and I'm just going to expand it out a little.

The easiest way to think about information theory is in terms of the smallest unit of information, a single bit.

In the C standard library, rand() returns an integer in the range 0 to RAND_MAX, a limit that may be defined differently depending on the platform. Suppose RAND_MAX happens to be defined as 2^n - 1 where n is some integer (this happens to be the case in Microsoft's implementation, where n is 15). Then we would say that a good implementation would return n bits of information.

Imagine that rand() constructs random numbers by flipping a coin to find the value of one bit, and then repeating until it has a batch of 15 bits. Then the bits are independent (the value of any one bit does not influence the likelihood of other bits in the same batch have a certain value). So each bit considered independently is like a random number between 0 and 1 inclusive, and is "evenly distributed" over that range (as likely to be 0 as 1).

The independence of the bits ensures that the numbers represented by batches of bits will also be evenly distributed over their range. This is intuitively obvious: if there are 15 bits, the allowed range is zero to 2^15 - 1 = 32767. Every number in that range is a unique pattern of bits, such as:

010110101110010

and if the bits are independent then no pattern is more likely to occur than any other pattern. So all possible numbers in the range are equally likely. And so the reverse is true: if rand() produces evenly distributed integers, then those numbers are made of independent bits.

So think of rand() as a production line for making bits, which just happens to serve them up in batches of arbitrary size. If you don't like the size, break the batches up into individual bits, and then put them back together in whatever quantities you like (though if you need a particular range that is not a power of 2, you need to shrink your numbers, and by far the easiest way to do that is to convert to floating point).

Returning to your original suggestion, suppose you want to go from batches of 15 to batches of 30, ask rand() for the first number, bit-shift it by 15 places, then add another rand() to it. That is a way to combine two calls to rand() without disturbing an even distribution. It works simply because there is no overlap between the locations where you place the bits of information.

This is very different to "stretching" the range of rand() by multiplying by a constant. For example, if you wanted to double the range of rand() you could multiply by two - but now you'd only ever get even numbers, and never odd numbers! That's not exactly a smooth distribution and might be a serious problem depending on the application, e.g. a roulette-like game supposedly allowing odd/even bets. (By thinking in terms of bits, you'd avoid that mistake intuitively, because you'd realise that multiplying by two is the same as shifting the bits to the left (greater significance) by one place and filling in the gap with zero. So obviously the amount of information is the same - it just moved a little.)

Such gaps in number ranges can't be griped about in floating point number applications, because floating point ranges inherently have gaps in them that simply cannot be represented at all: an infinite number of missing real numbers exist in the gap between each two representable floating point numbers! So we just have to learn to live with gaps anyway.

As others have warned, intuition is risky in this area, especially because mathematicians can't resist the allure of real numbers, which are horribly confusing things full of gnarly infinities and apparent paradoxes.

But at least if you think it terms of bits, your intuition might get you a little further. Bits are really easy - even computers can understand them.

Community
  • 1
  • 1
Daniel Earwicker
  • 114,894
  • 38
  • 205
  • 284
  • 3
    +1: Actually, there's more numbers missing between any two IEEE double precision floats than there are numbers in the whole of the (mathematical) integers. – Donal Fellows Oct 20 '10 at 13:27
13

As others have said, the easy short answer is: No, it is not more random, but it does change the distribution.

Suppose you were playing a dice game. You have some completely fair, random dice. Would the die rolls be "more random" if before each die roll, you first put two dice in a bowl, shook it around, picked one of the dice at random, and then rolled that one? Clearly it would make no difference. If both dice give random numbers, then randomly choosing one of the two dice will make no difference. Either way you'll get a random number between 1 and 6 with even distribution over a sufficient number of rolls.

I suppose in real life such a procedure might be useful if you suspected that the dice might NOT be fair. If, say, the dice are slightly unbalanced so one tends to give 1 more often than 1/6 of the time, and another tends to give 6 unusually often, then randomly choosing between the two would tend to obscure the bias. (Though in this case, 1 and 6 would still come up more than 2, 3, 4, and 5. Well, I guess depending on the nature of the imbalance.)

There are many definitions of randomness. One definition of a random series is that it is a series of numbers produced by a random process. By this definition, if I roll a fair die 5 times and get the numbers 2, 4, 3, 2, 5, that is a random series. If I then roll that same fair die 5 more times and get 1, 1, 1, 1, 1, then that is also a random series.

Several posters have pointed out that random functions on a computer are not truly random but rather pseudo-random, and that if you know the algorithm and the seed they are completely predictable. This is true, but most of the time completely irrelevant. If I shuffle a deck of cards and then turn them over one at a time, this should be a random series. If someone peeks at the cards, the result will be completely predictable, but by most definitions of randomness this will not make it less random. If the series passes statistical tests of randomness, the fact that I peeked at the cards will not change that fact. In practice, if we are gambling large sums of money on your ability to guess the next card, then the fact that you peeked at the cards is highly relevant. If we are using the series to simulate the menu picks of visitors to our web site in order to test the performance of the system, then the fact that you peeked will make no difference at all. (As long as you do not modify the program to take advantage of this knowledge.)

EDIT

I don't think I could my response to the Monty Hall problem into a comment, so I'll update my answer.

For those who didn't read Belisarius link, the gist of it is: A game show contestant is given a choice of 3 doors. Behind one is a valuable prize, behind the others something worthless. He picks door #1. Before revealing whether it is a winner or a loser, the host opens door #3 to reveal that it is a loser. He then gives the contestant the opportunity to switch to door #2. Should the contestant do this or not?

The answer, which offends many people's intuition, is that he should switch. The probability that his original pick was the winner is 1/3, that the other door is the winner is 2/3. My initial intuition, along with that of many other people, is that there would be no gain in switching, that the odds have just been changed to 50:50.

After all, suppose that someone switched on the TV just after the host opened the losing door. That person would see two remaining closed doors. Assuming he knows the nature of the game, he would say that there is a 1/2 chance of each door hiding the prize. How can the odds for the viewer be 1/2 : 1/2 while the odds for the contestant are 1/3 : 2/3 ?

I really had to think about this to beat my intuition into shape. To get a handle on it, understand that when we talk about probabilities in a problem like this, we mean, the probability you assign given the available information. To a member of the crew who put the prize behind, say, door #1, the probability that the prize is behind door #1 is 100% and the probability that it is behind either of the other two doors is zero.

The crew member's odds are different than the contestant's odds because he knows something the contestant doesn't, namely, which door he put the prize behind. Likewise, the contestent's odds are different than the viewer's odds because he knows something that the viewer doesn't, namely, which door he initially picked. This is not irrelevant, because the host's choice of which door to open is not random. He will not open the door the contestant picked, and he will not open the door that hides the prize. If these are the same door, that leaves him two choices. If they are different doors, that leaves only one.

So how do we come up with 1/3 and 2/3 ? When the contestant originally picked a door, he had a 1/3 chance of picking the winner. I think that much is obvious. That means there was a 2/3 chance that one of the other doors is the winner. If the host game him the opportunity to switch without giving any additional information, there would be no gain. Again, this should be obvious. But one way to look at it is to say that there is a 2/3 chance that he would win by switching. But he has 2 alternatives. So each one has only 2/3 divided by 2 = 1/3 chance of being the winner, which is no better than his original pick. Of course we already knew the final result, this just calculates it a different way.

But now the host reveals that one of those two choices is not the winner. So of the 2/3 chance that a door he didn't pick is the winner, he now knows that 1 of the 2 alternatives isn't it. The other might or might not be. So he no longer has 2/3 dividied by 2. He has zero for the open door and 2/3 for the closed door.

Jay
  • 26,876
  • 10
  • 61
  • 112
  • Very good analogies! I guess this is a very good plain english explanation, and unlike many others, you actually answered my question :) – Trufa Oct 18 '10 at 21:56
  • @Trufa @Jay The confusion among possible pre-knowledge of the events and randomness is VERY common. Let me share with you this interesting story about a woman who solved a problem and casted a pile of shame on some of the better mathematicians in academy. They said many things to regret later (such as "You made a mistake, but look at the positive side. If all those Ph.D.'s were wrong, the country would be in some very serious trouble."). So here is the story, related to your considerations ... enjoy! http://www.marilynvossavant.com/articles/gameshow.html – Dr. belisarius Oct 18 '10 at 22:55
  • @belisarius yep. I say blackjack21 :) just kidding I get you point! – Trufa Oct 18 '10 at 23:00
  • @belisarius BTW never got that one I will give it another try now! – Trufa Oct 18 '10 at 23:01
  • @Trufa And here is an article showing the academic reaction to Marilyn's statement http://query.nytimes.com/gst/fullpage.html?res=9D0CEFDD1E3FF932A15754C0A967958260 (VERY VERY fun) – Dr. belisarius Oct 18 '10 at 23:14
  • @belisarius: That's a wonderful example of why conditional probability is really non-obvious to most folks. – Donal Fellows Oct 19 '10 at 10:36
  • @Donal Fellows Without shame I must confess that Marylin's argument took me a while to grasp ... :) – Dr. belisarius Oct 19 '10 at 11:49
  • @belisarius: I got it straight away the first time I encountered the problem, but then I guess I don't view things quite the same as others. 'Tis how it goes, I suppose… – Donal Fellows Oct 19 '10 at 12:44
  • @Donal: Congratulations on superior intuition. You win the cookie. – Jay Oct 19 '10 at 20:40
  • Another way to help people understand the "Monty Hall" choice is to use a deck of cards. With all the cards face down, ask them to pick the Ace of Spades. Once they have choosen a card, flip the rest of the deck over except one other card. Ask them if them if they want to stick with their original choice or switch cards. My bet is most people will quickly see the merit of changing cards (1 in 52 vs 1 in 2). – NealB Oct 25 '10 at 20:52
11

Consider you have a simple coin flip problem where even is considered heads and odd is considered tails. The logical implementation is:

rand() mod 2

Over a large enough distribution, the number of even numbers should equal the number of odd numbers.

Now consider a slight tweak:

rand() * rand() mod 2

If one of the results is even, then the entire result should be even. Consider the 4 possible outcomes (even * even = even, even * odd = even, odd * even = even, odd * odd = odd). Now, over a large enough distribution, the answer should be even 75% of the time.

I'd bet heads if I were you.

This comment is really more of an explanation of why you shouldn't implement a custom random function based on your method than a discussion on the mathematical properties of randomness.

user479885
  • 111
  • 3
  • 1
    Beware! `rand()%2` may be not very random; that really depends on the randomness of the low bit, and some PRNGs aren't very good that way. (Of course, in some languages you get a floating-point result out of `rand()` so you can't do it that way at all…) – Donal Fellows Oct 19 '10 at 10:28
10

When in doubt about what will happen to the combinations of your random numbers, you can use the lessons you learned in statistical theory.

In OP's situation he wants to know what's the outcome of X*X = X^2 where X is a random variable distributed along Uniform[0,1]. We'll use the CDF technique since it's just a one-to-one mapping.

Since X ~ Uniform[0,1] it's cdf is: fX(x) = 1 We want the transformation Y <- X^2 thus y = x^2 Find the inverse x(y): sqrt(y) = x this gives us x as a function of y. Next, find the derivative dx/dy: d/dy (sqrt(y)) = 1/(2 sqrt(y))

The distribution of Y is given as: fY(y) = fX(x(y)) |dx/dy| = 1/(2 sqrt(y))

We're not done yet, we have to get the domain of Y. since 0 <= x < 1, 0 <= x^2 < 1 so Y is in the range [0, 1). If you wanna check if the pdf of Y is indeed a pdf, integrate it over the domain: Integrate 1/(2 sqrt(y)) from 0 to 1 and indeed, it pops up as 1. Also, notice the shape of the said function looks like what belisarious posted.

As for things like X1 + X2 + ... + Xn, (where Xi ~ Uniform[0,1]) we can just appeal to the Central Limit Theorem which works for any distribution whose moments exist. This is why the Z-test exists actually.

Other techniques for determining the resulting pdf include the Jacobian transformation (which is the generalized version of the cdf technique) and MGF technique.

EDIT: As a clarification, do note that I'm talking about the distribution of the resulting transformation and not its randomness. That's actually for a separate discussion. Also what I actually derived was for (rand())^2. For rand() * rand() it's much more complicated, which, in any case won't result in a uniform distribution of any sorts.

Wil
  • 101
  • 4
9

It's not exactly obvious, but rand() is typically more random than rand()*rand(). What's important is that this isn't actually very important for most uses.

But firstly, they produce different distributions. This is not a problem if that is what you want, but it does matter. If you need a particular distribution, then ignore the whole “which is more random” question. So why is rand() more random?

The core of why rand() is more random (under the assumption that it is producing floating-point random numbers with the range [0..1], which is very common) is that when you multiply two FP numbers together with lots of information in the mantissa, you get some loss of information off the end; there's just not enough bit in an IEEE double-precision float to hold all the information that was in two IEEE double-precision floats uniformly randomly selected from [0..1], and those extra bits of information are lost. Of course, it doesn't matter that much since you (probably) weren't going to use that information, but the loss is real. It also doesn't really matter which distribution you produce (i.e., which operation you use to do the combination). Each of those random numbers has (at best) 52 bits of random information – that's how much an IEEE double can hold – and if you combine two or more into one, you're still limited to having at most 52 bits of random information.

Most uses of random numbers don't use even close to as much randomness as is actually available in the random source. Get a good PRNG and don't worry too much about it. (The level of “goodness” depends on what you're doing with it; you have to be careful when doing Monte Carlo simulation or cryptography, but otherwise you can probably use the standard PRNG as that's usually much quicker.)

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215
  • 1
    This answer really needs to be read in conjunction with belisarius's magnificent one; they cover different aspects of the problem. – Donal Fellows Oct 19 '10 at 10:52
7

Floating randoms are based, in general, on an algorithm that produces an integer between zero and a certain range. As such, by using rand()*rand(), you are essentially saying int_rand()*int_rand()/rand_max^2 - meaning you are excluding any prime number / rand_max^2.

That changes the randomized distribution significantly.

rand() is uniformly distributed on most systems, and difficult to predict if properly seeded. Use that unless you have a particular reason to do math on it (i.e., shaping the distribution to a needed curve).

Fordi
  • 2,798
  • 25
  • 20
  • @belisarius : That's only the case if 1 is a possible outcome of the random process. – Joris Meys Oct 19 '10 at 00:30
  • I had to read a long way down the answers before I found this one. You state a clear problem: the result space (number of possible values) of `rand()*rand()` is smaller than the result space of `rand()` - since it excludes prime numbers. Gets my vote... – Floris Oct 14 '13 at 19:52
7

Multiplying numbers would end up in a smaller solution range depending on your computer architecture.

If the display of your computer shows 16 digits rand() would be say 0.1234567890123 multiplied by a second rand(), 0.1234567890123, would give 0.0152415 something you'd definitely find fewer solutions if you'd repeat the experiment 10^14 times.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Huub
  • 91
  • 1
3

Most of these distributions happen because you have to limit or normalize the random number.

We normalize it to be all positive, fit within a range, and even to fit within the constraints of the memory size for the assigned variable type.

In other words, because we have to limit the random call between 0 and X (X being the size limit of our variable) we will have a group of "random" numbers between 0 and X.

Now when you add the random number to another random number the sum will be somewhere between 0 and 2X...this skews the values away from the edge points (the probability of adding two small numbers together and two big numbers together is very small when you have two random numbers over a large range).

Think of the case where you had a number that is close to zero and you add it with another random number it will certainly get bigger and away from 0 (this will be true of large numbers as well as it is unlikely to have two large numbers (numbers close to X) returned by the Random function twice.

Now if you were to setup the random method with negative numbers and positive numbers (spanning equally across the zero axis) this would no longer be the case.

Say for instance RandomReal({-x, x}, 50000, .01) then you would get an even distribution of numbers on the negative a positive side and if you were to add the random numbers together they would maintain their "randomness".

Now I'm not sure what would happen with the Random() * Random() with the negative to positive span...that would be an interesting graph to see...but I have to get back to writing code now. :-P

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
user479538
  • 31
  • 1
2
  1. There is no such thing as more random. It is either random or not. Random means "hard to predict". It does not mean non-deterministic. Both random() and random() * random() are equally random if random() is random. Distribution is irrelevant as far as randomness goes. If a non-uniform distribution occurs, it just means that some values are more likely than others; they are still unpredictable.

  2. Since pseudo-randomness is involved, the numbers are very much deterministic. However, pseudo-randomness is often sufficient in probability models and simulations. It is pretty well known that making a pseudo-random number generator complicated only makes it difficult to analyze. It is unlikely to improve randomness; it often causes it to fail statistical tests.

  3. The desired properties of the random numbers are important: repeatability and reproducibility, statistical randomness, (usually) uniformly distributed, and a large period are a few.

  4. Concerning transformations on random numbers: As someone said, the sum of two or more uniformly distributed results in a normal distribution. This is the additive central limit theorem. It applies regardless of the source distribution as long as all distributions are independent and identical. The multiplicative central limit theorem says the product of two or more independent and indentically distributed random variables is lognormal. The graph someone else created looks exponential, but it is really lognormal. So random() * random() is lognormally distributed (although it may not be independent since numbers are pulled from the same stream). This may be desirable in some applications. However, it is usually better to generate one random number and transform it to a lognormally-distributed number. Random() * random() may be difficult to analyze.

For more information, consult my book at www.performorama.org. The book is under construction, but the relevant material is there. Note that chapter and section numbers may change over time. Chapter 8 (probability theory) -- sections 8.3.1 and 8.3.3, chapter 10 (random numbers).

sra
  • 23,820
  • 7
  • 55
  • 89
Tom
  • 29
  • 2
1

We can compare two arrays of numbers regarding the randomness by using Kolmogorov complexity If the sequence of numbers can not be compressed, then it is the most random we can reach at this length... I know that this type of measurement is more a theoretical option...

Trufa
  • 39,971
  • 43
  • 126
  • 190
HamoriZ
  • 2,370
  • 18
  • 38
1

Actually, when you think about it rand() * rand() is less random than rand(). Here's why.

Essentially, there are the same number of odd numbers as even numbers. And saying that 0.04325 is odd, and like 0.388 is even, and 0.4 is even, and 0.15 is odd,

That means that rand() has a equal chance of being an even or odd decimal.

On the other hand, rand() * rand() has it's odds stacked a bit differently. Lets say:

double a = rand();
double b = rand();
double c = a * b;

a and b both have a 50% precent chance of being even or odd. Knowing that

  • even * even = even
  • even * odd = even
  • odd * odd = odd
  • odd * even = even

means that there a 75% chance that c is even, while only a 25% chance it's odd, making the value of rand() * rand() more predictable than rand(), therefore less random.

John S.
  • 626
  • 1
  • 4
  • 16
  • `rand()` usually gives a number between 0 and 1. Does talking about whether it's even or odd make sense? – Teepeemm Jan 15 '16 at 18:18
  • 1
    Actually, `0.2*0.2=0.04`, which suggests a fundamental flaw with this approach: multiplying the 53 bits of two doubles will give about 100 bits in the result. But the last half of these bits will be discarded. So when you take two doubles with a 1 as their least significant bit, you can't say anything about the least significant bit of their product. – Teepeemm Jan 17 '16 at 03:04
  • Or, to put it another way, you've assumed that the definition of "even" and "odd" that makes sense for the distribution of `rand()` are the same as the definitions of "even" and "odd" that make sense for the distribution of `rand()*rand()`. If that is not the case, this argument fails. That's true for integers, but these are not integers. – David Schwartz Oct 07 '16 at 18:42
0

Assuming that rand() returns a number between [0, 1) it is obvious that rand() * rand() will be biased toward 0. This is because multiplying x by a number between [0, 1) will result in a number smaller than x. Here is the distribution of 10000 more random numbers:

google.charts.load("current", { packages: ["corechart"] });
google.charts.setOnLoadCallback(drawChart);

function drawChart() {
  var i;
  var randomNumbers = [];
  for (i = 0; i < 10000; i++) {
    randomNumbers.push(Math.random() * Math.random());
  }
  var chart = new google.visualization.Histogram(document.getElementById("chart-1"));
  var data = new google.visualization.DataTable();
  data.addColumn("number", "Value");
  randomNumbers.forEach(function(randomNumber) {
    data.addRow([randomNumber]);
  });
  chart.draw(data, {
    title: randomNumbers.length + " rand() * rand() values between [0, 1)",
    legend: { position: "none" }
  });
}
<script src="https://www.gstatic.com/charts/loader.js"></script>

<div id="chart-1" style="height: 500px">Generating chart...</div>

If rand() returns an integer between [x, y] then you have the following distribution. Notice the number of odd vs even values:

google.charts.load("current", { packages: ["corechart"] });
google.charts.setOnLoadCallback(drawChart);
document.querySelector("#draw-chart").addEventListener("click", drawChart);

function randomInt(min, max) {
  return Math.floor(Math.random() * (max - min + 1)) + min;
}

function drawChart() {
  var min = Number(document.querySelector("#rand-min").value);
  var max = Number(document.querySelector("#rand-max").value);
  if (min >= max) {
    return;
  }
  var i;
  var randomNumbers = [];
  for (i = 0; i < 10000; i++) {
    randomNumbers.push(randomInt(min, max) * randomInt(min, max));
  }
  var chart = new google.visualization.Histogram(document.getElementById("chart-1"));
  var data = new google.visualization.DataTable();
  data.addColumn("number", "Value");
  randomNumbers.forEach(function(randomNumber) {
    data.addRow([randomNumber]);
  });
  chart.draw(data, {
    title: randomNumbers.length + " rand() * rand() values between [" + min + ", " + max + "]",
    legend: { position: "none" },
    histogram: { bucketSize: 1 }
  });
}
<script src="https://www.gstatic.com/charts/loader.js"></script>

<input type="number" id="rand-min" value="0" min="0" max="10">
<input type="number" id="rand-max" value="9" min="0" max="10">
<input type="button" id="draw-chart" value="Apply">

<div id="chart-1" style="height: 500px">Generating chart...</div>
Salman A
  • 262,204
  • 82
  • 430
  • 521
0

Use a linear feedback shift register (LFSR) that implements a primitive polynomial.

The result will be a sequence of 2^n pseudo-random numbers, ie none repeating in the sequence where n is the number of bits in the LFSR .... resulting in a uniform distribution.

http://en.wikipedia.org/wiki/Linear_feedback_shift_register http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf

Use a "random" seed based on microsecs of your computer clock or maybe a subset of the md5 result on some continuously changing data in your file system.

For example, a 32-bit LFSR will generate 2^32 unique numbers in sequence (no 2 alike) starting with a given seed. The sequence will always be in the same order, but the starting point will be different (obviously) for a different seeds. So, if a possibly repeating sequence between seedings is not a problem, this might be a good choice.

I've used 128-bit LFSR's to generate random tests in hardware simulators using a seed which is the md5 results on continuously changing system data.

johnny
  • 11
-1

As others already pointed out, this question is hard to answer since everyone of us has his own picture of randomness in his head.

That is why, I would highly recommend you to take some time and read through this site to get a better idea of randomness:

To get back to the real question. There is no more or less random in this term:

both only appears random!

In both cases - just rand() or rand() * rand() - the situation is the same: After a few billion of numbers the sequence will repeat(!). It appears random to the observer, because he does not know the whole sequence, but the computer has no true random source - so he can not produce randomness either.

e.g.: Is the weather random? We do not have enough sensors or knowledge to determine if weather is random or not.

Fabian Bigler
  • 10,403
  • 6
  • 47
  • 70
-1

OK, so I will try to add some value to complement others answers by saying that you are creating and using a random number generator.

Random number generators are devices (in a very general sense) that have multiple characteristics which can be modified to fit a purpose. Some of them (from me) are:

  • Entropy: as in Shannon Entropy
  • Distribution: statistical distribution (poisson, normal, etc.)
  • Type: what is the source of the numbers (algorithm, natural event, combination of, etc.) and algorithm applied.
  • Efficiency: rapidity or complexity of execution.
  • Patterns: periodicity, sequences, runs, etc.
  • and probably more...

In most answers here, distribution is the main point of interest, but by mix and matching functions and parameters, you create new ways of generating random numbers which will have different characteristics for some of which the evaluation may not be obvious at first glance.

Loki
  • 29,950
  • 9
  • 48
  • 62
-1

It's easy to show that the sum of the two random numbers is not necessarily random. Imagine you have a 6 sided die and roll. Each number has a 1/6 chance of appearing. Now say you had 2 dice and summed the result. The distribution of those sums is not 1/12. Why? Because certain numbers appear more than others. There are multiple partitions of them. For example the number 2 is the sum of 1+1 only but 7 can be formed by 3+4 or 4+3 or 5+2 etc... so it has a larger chance of coming up.

Therefore, applying a transform, in this case addition on a random function does not make it more random, or necessarily preserve randomness. In the case of the dice above, the distribution is skewed to 7 and therefore less random.

sashang
  • 11,704
  • 6
  • 44
  • 58
-2

The answer would be it depends, hopefully the rand()*rand() would be more random than rand(), but as:

  • both answers depends on the bit size of your value
  • that in most of the cases you generate depending on a pseudo-random algorithm (which is mostly a number generator that depends on your computer clock, and not that much random).
  • make your code more readable (and not invoke some random voodoo god of random with this kind of mantra).

Well, if you check any of these above I suggest you go for the simple "rand()". Because your code would be more readable (wouldn't ask yourself why you did write this, for ...well... more than 2 sec), easy to maintain (if you want to replace you rand function with a super_rand).

If you want a better random, I would recommend you to stream it from any source that provide enough noise (radio static), and then a simple rand() should be enough.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
dvhh
  • 4,724
  • 27
  • 33