30

This might be more Math related than C#, but I need a C# solution so I'm putting it here.

My question is about the probability of random number generators, more specifically if each possible value is returned with an equal probability.

I know there is the Random.Next(int, int) method which returns a number between the first integer and last (with the last being exclusive).

Random.Next() [without overloads] will return a value between 0 and Int32.MaxValue (which is 2147483647) - 1, so 2147483646.

If I want a value between 1 and 10, I could call Random.Next(1, 11) to do this, however does every value between 1 and 10 have an equal probability of occuring?

For example, the range is 10, so 2147483646 is not perfectly divisible by 10, so the values 1-6 have a slightly higher probability of occuring (because 2147483646 % 10 = 6). This is of course assuming that every value within Random.Next() [without overloads] returns a value between 0 and 2147483646 with equal probability.

How would one ensure that every number within a range has an equal probability of occuring? Let's say for a lottery type system where it would be unfair for some people to have a higher probility than others, I'm not saying I would use the C# built in RNG for this, I was just using it as an example.

Matthew
  • 24,703
  • 9
  • 76
  • 110
  • `Random` is not the object to use if you want a Uniform number distribution, because the value returned is necesarily going to be random (or pseudo-random in this case). A sequence of 1,1,1,1,1,1 is entirely possible with a random number generator. In fact, every time you called it, it could always be 1, even though that would be unlikely. – Tejs Apr 16 '12 at 17:35
  • 2
    But in the long run, it would (should) limit to uniformly random distribution nevertheless. –  Apr 16 '12 at 17:37
  • @Tejs: tossing a coint 100 times and getting 100 times the same face is also possible (but kinda unlikely) – BlackBear Apr 16 '12 at 17:37
  • It is distributed uniformly in the long run. If you want something stronger, look into Mersenne Twister or cryptographic PRNGs. – ashes999 Apr 16 '12 at 17:37
  • Do you require non-uniform distribution? – asawyer Apr 16 '12 at 17:37
  • I don't need a uniform distrubution, I just need the probability of each value to be equal. – Matthew Apr 16 '12 at 17:39
  • 2
    @Matthew: That's what a uniform distribution is. – jason Apr 16 '12 at 17:49
  • HOW equal? For instance, Jason's answer suggests throwing away the numbers 2147483640, 2147483641, 2147483642, 2147483643, 2147483644, 2147483645, 2147483646 (and 2147483647?). Notice that this will only make a difference once every 300,000,000 trials. Eg. if you used this for a weekly lottery draw, it would make a difference every 6 million years. For most applications, I'd ignore that variation. If it DOES matter, I'd give serious thought to the idea that you don't know enough to certify that a standard PRNG is sufficient or not. – Jack V. Apr 23 '12 at 09:12
  • @JackV I'm not sure what you're getting at, but by equal, I mean 100% equal, every number has an equal probability of occuring. As for the idea that I don't know enough about PRNG, yes, that's true, which is why I asked the question, this is one issue I know of that can exist in PRNGs. – Matthew Apr 23 '12 at 17:01

4 Answers4

17

I note that no one actually answered the meaty question in your post:

For example, the range is 10, so 2147483646 is not perfectly divisible by 10, so the values 1-6 have a slightly higher probability of occuring (because 2147483646 % 10 = 6). This is of course assuming that every value within Random.Next() [without overloads] returns a value between 0 and 2147483646 with equal probability.

How would one ensure that every number within a range has an equal probability of occuring?

Right, so you just throw out the values that cause the imbalance. For example, let's say that you had a RNG that could produce a uniform distribution over { 0, 1, 2, 3, 4 }, and you wanted to use it to produce a uniform distribution over { 0, 1 }. The naive implementation is: draw from {0, 1, 2, 3, 4} and then return the value % 2; this, however, would obviously produce a biased sample. This happens because, as you note, 5 (the number of items) is not evenly divisible by 2. So, instead, throw any draws that produce the value 4. Thus, the algorithm would be

 draw from { 0, 1, 2, 3, 4 }
 if the value is 4, throw it out
 otherwise, return the value % 2

You can use this basic idea to solve the general problem.

however does every value between 1 and 10 have an equal probability of occuring?

Yes, it does. From MSDN:

Pseudo-random numbers are chosen with equal probability from a finite set of numbers.

Edit: Apparently the documentation is NOT consistent with the current implementation in .NET. The documentation states the draws are uniform, but the code suggests that it is not. However, that does NOT negate the fact that this is a soluble problem, and my approach is one way to solve it.

Community
  • 1
  • 1
jason
  • 236,483
  • 35
  • 423
  • 525
  • Does this mean that the overloaded `Random.Next` methods that take a range do this "re-roll" scenario for numbers that would introduce bias? – Matthew Apr 16 '12 at 17:50
  • @Matthew: Well, I don't know what their exact implementation is, but the method I outlined is the most common way to do it. – jason Apr 16 '12 at 17:53
  • thank you, I might do some digging to see what microsoft does internally, but to me what you're saying makes sense, and is the only thing I can think of that would work for an arbitrary range. – Matthew Apr 16 '12 at 17:55
  • 1
    @Matthew the `Next(min, max)` method does *not* do a "re-roll". – phoog Apr 16 '12 at 18:01
  • `Random.Next(min, max)` is essentially implemented as `(int) (NextDouble() * (max - min) + min)`. By the way, I think there's a typo in `draw from { 0, 1, 2 }` - should probably include 3 and 4 as well? – Anlo Apr 16 '12 at 18:11
12

The C# built in RNG is, as you expect, a uniformly distributed one. Every number has an equal likelihood of occurring given the range you specify for Next(min, max).

You can test this yourself (I have) by taking, say, 1M samples and storing how many times each number actually appears. You'll get an almost flat-line curve if you graph it.

Also note that, each number having an equal likelihood doesn't mean that each number will occur the same amount of times. If you're looking at random numbers from 1 to 10, in 100 iterations, it won't be an even distribution of 10x occurrence for each number. Some numbers may occur 8 times, and others 12 or 13 times. However, with more iterations, this tends to even out somewhat.

Also, since it's mentioned in the comments, I'll add: if you want something stronger, look up cryptographic PRNGs. Mersenne Twister is particularly good from what I've seen (fast, cheap to compute, huge period) and it has open-source implementations in C#.

ashes999
  • 9,925
  • 16
  • 73
  • 124
  • My question is more of if the PRNG gives out values between 1 and 16 (inclusive) and we assume each number has equal probability, but I need a value of 1 and 10, how do I translate the value of 1-16 to 1-10, where each value of 1-10 has equal probability. The only thing I can think of is if the number is out of range, re-roll until it is. – Matthew Apr 16 '12 at 17:42
  • 2
    @Matthew You can find the lowest common multiple of 16 and 10 (which is 80). This means that you can pick 5 random numbers between 1-16, sum them, and then divide by 8 (and floor the results). You now have a random number between 1 and 10 with equal probabilities of each number. – Servy Apr 16 '12 at 17:49
  • Edit, nvm my last post, that will skew the results towards the middle. – Servy Apr 16 '12 at 17:56
  • @Servy this is an interesting approach to the problem, thank you. – Matthew Apr 16 '12 at 17:56
  • 1
    @ashes999: What do you mean by "something stronger"? And it's emphatically stated in the docs for MT that it's not for cryptography, so why do you mention it as if it is? MT is based on 624 states (look in the C code, you'll see "`static unsigned long mt[624];`" (it's the state vector). Once you know 624 outputs from MT, you can predict the entire series. This is emphatically not suitable for cryptography. So, I have to ask, what do you mean? Your statements in your answer and in your comments suggest that you don't actually understand randomness at all. – jason Apr 16 '12 at 17:57
  • Please see my answer for an explanation why the statement "every number has an equal likelihood of occurring given the range you specify for `Net(min, max)`" is incorrect. – phoog Apr 16 '12 at 17:57
  • @Jason no worries. That's out of scope for this question anyway. I mentioned it because the topic arose in comments. We can discuss that offline, if you're interested in doing so. You interpreted my comment as "use MT for cryptography," for some reason. The question isn't about cryptographic PRNGs. So, we're cool. – ashes999 Apr 16 '12 at 18:54
9

Test program:

var a = new int[10];
var r = new Random();
for (int i = 0; i < 1000000; i++) a[r.Next(1, 11) - 1]++;
for (int i = 0; i < a.Length; i++) Console.WriteLine("{0,2}{1,10}", i + 1, a[i]);

Output:

 1      99924
 2     100199
 3     100568
 4     100406
 5     100114
 6      99418
 7      99759
 8      99573
 9     100121
10      99918

Conclusion:

Each value is returned with an equal probability.

dtb
  • 213,145
  • 36
  • 401
  • 431
  • Adding a the % value will show that the difference between each value is really tiny – BlackBear Apr 16 '12 at 17:39
  • Although the difference is small, they are not equal so your conclusion is wrong. – Ash Burlaczenko Apr 16 '12 at 17:45
  • @ashes999 Not true as long as the seed is sufficiently 'random'. The probability of any given value is equal. You can get just one value and the probability of each one is still the same (10% for the provided range). – Servy Apr 16 '12 at 17:45
  • 1
    @AshBurlaczenko One would need to do a chi-squared analysis to determine how the results compare to the expected output (of 100,000) each, and based on the results one could (potentially) statistically assert that the function reflects uniform distribution. – Servy Apr 16 '12 at 17:46
  • @ashes999: why wouldn't it be true for a smaller number of iterations? Equal probability doesn't mean each number would occur once if you iterate 10 times. – comecme Apr 16 '12 at 18:00
  • Assuming a uniform distribution of `NextDouble()`, some values of `Next(max, min)` may be (very slightly) more or less likely to be returned. The proof is fairly simple; see my answer for an explanation. – phoog Apr 16 '12 at 18:00
  • @Servy the distribution is noticably non-uniform when you only pick a few numbers (say, less integers than exist in the random range). That's what I mean -- not that the probability changes as you request more samples. – ashes999 Apr 16 '12 at 18:10
  • @ashes999 The assertion of this post is that each asked for number has an equal probability of each number in the range. You said that it's not true when it is. If you're referring to the fact that totaling X random numbers from a uniform distribution won't (always, or even usually) result in an equal number of each result then yes, that's correct. That doesn't make the RNG any less uniform. – Servy Apr 16 '12 at 18:26
  • @Servy I think you misunderstood my answer. Where did I say that? I started with an assertion that the range gives you equal probability, and mentioned a fallacy of rising probability for numbers that didn't appear yet. – ashes999 Apr 16 '12 at 18:34
  • @ashes999 You said that the answer was only true when `[...]`. The answer is always true; the function always returns a random value with uniform distribution. The issue is that you left too much implied in your first few comments, this making them technically wrong (and also misleading/confusing) despite the fact that I knew what you meant. – Servy Apr 16 '12 at 19:20
  • @Servy okay, I've deleted my comments. Feel free to down-vote and all that good stuff if you see a problem. – ashes999 Apr 16 '12 at 19:52
3

Ashes and dtb are incorrect: You are right to suspect that some numbers would have a greater chance of occurring than others.

When you call .Next(x, y), there are y - x possible return values. The .NET 4.0 Random class calculates a return value based on the return value of NextDouble() (this is a slightly simplified description).

Obviously, the set of possible double values is finite, and, as you note, it may not be a multiple of the size of the set of possible return values of .Next(x, y). Therefore, assuming that the set of input values is uniformly distributed, some output values will have a slightly greater probability of occurring.

I don't know off hand how many numeric double values there are (i.e., excluding infinity and NaN values), but it is certainly larger than 2^32. In your case, if we assume 2^32 values, for the sake of argument, then we have to map 4294967296 inputs to 10 outputs. Some values would have a 429496730 / 429496729 greater probability of occurring, or 0.00000023283064397913028110629 percent greater. In fact, since the number of input states is greater than 2^32, the difference in probability would be even smaller.

phoog
  • 42,068
  • 6
  • 79
  • 117
  • This is exactly what I'm concerned about. The answer @Jason gave me seems to solve my problem, however. – Matthew Apr 16 '12 at 17:58
  • @phoog: You can just throw out values that cause biasing. – jason Apr 16 '12 at 18:01
  • @Jason but you can't do that with `Next(min, max)` since you don't control its implementation. To do this with `System.Random`, you'd need to know the set of possible return values from `NextDouble()`, or, you could create a derived class and provide your own implementation of `Sample()` (the protected method that is wrapped by `NextDouble()`). – phoog Apr 16 '12 at 18:04
  • @phoog: No, but my point is that `Next(Int32, Int32)` could be doing it. How do you *know* that it's not? – jason Apr 16 '12 at 18:06
  • @phoog: Fair enough. It appears then that the docs aren't consistent with the implementation. That doesn't negate the fact that this is a soluble problem. – jason Apr 16 '12 at 18:17
  • 2
    @Jason of course. I note that your quote from the docs is a general statement about random numbers from the page for the Random class, not a statement about the behavior of the `Next(min, max)` method. The pages for the integer `Next` methods make no mention of probability, one way or the other. – phoog Apr 16 '12 at 18:19
  • @phoog: Right, but given that that statement is not made against any particular method, one can only conclude that it applies to all of the methods on `Random` that are intended to produce random values. – jason Apr 16 '12 at 19:29
  • @Jason I've seen enough instances of incorrect, misleading, or ambiguous documentation on MSDN that I wouldn't make that leap of faith. – phoog Apr 16 '12 at 19:42
  • @phoog: Yes, of course. However, can you give me a more reasonable leap of faith to make here? – jason Apr 16 '12 at 19:47
  • 1
    @Jason No; rather, I would recommend making no leap of faith. That's why I decompiled the IL. – phoog Apr 16 '12 at 19:53