5

I am using Random to generate a sequence of random number. I am constructing the random object just once and then inside the loop generating the random values (300 of them). The problem is that once I get all the values and do a sort on them I realize that some of them are equal and/or sequential: I am generating numbers from 0 to 50000.

This is my script:

Random rnd = new Random();
for (int n=0; n < 300; n++)
{
    int RndNumber = rnd.Next(0, 50000);
    System.Threading.Thread.Sleep(3);
}

Can someone have a clue on why this is happening, and how can I improve this to make it more random?

c00kiemon5ter
  • 16,994
  • 7
  • 46
  • 48
Terrence
  • 61
  • 1
  • 3
  • You need to be specific about how you define "more random" – cordialgerm Feb 01 '11 at 04:12
  • 1
    why are you sorting random numbers? doesn't that defeat the purpose of generating random numbers? – helloworld922 Feb 01 '11 at 04:14
  • Could the problem be with sorting? – Marlon Feb 01 '11 at 04:14
  • 3
    I'm no mathematician, but it seems to me that it would be highly unlikely that, after sorting a list of 300 random numbers between 0 and 50000, at least a couple of them wouldn't be equal or sequential. – jonmorgan Feb 01 '11 at 04:15
  • 5
    Why are you sorting the random values? Is it just to test their distribution? This might have something to do with the [Birthday paradox](http://en.wikipedia.org/wiki/Birthday_problem#Collision_counting) – Cameron Feb 01 '11 at 04:15
  • 1
    Do you expect the numbers to be equally spaced across the range - 10, 110, 210, 310, etc? That seems far less random. – Kirk Broadhurst Feb 01 '11 at 04:18
  • you need a much bigger sample set to really evaluate the distribution – BrokenGlass Feb 01 '11 at 04:22
  • Hmmm, by my back of the envelope calculation: the chance that a pair of numbers in that range would be equal is 1 in 50000 and there are almost 45000 pairs in a set of 300 numbers so it's quite likely that there will be some matches. – CurtainDog Feb 01 '11 at 04:23
  • Thanks to all your value feedback. My question solved by Jason and Dan Tao. I sort the numbers so that i can see the 'quality' of the random number i got. – Terrence Feb 08 '11 at 04:36

3 Answers3

21

So this is the birthday paradox*. When you draw 300 numbers from 50000 the approximate probability that at least two of them are equal is

p(300) = 1 - exp(-300 * 300 / (2 * 50000))
       = 0.59

(I could work out the exact probability but I'm lazy!.)

So, chances are more likely than not that you'll have a collision. Sequential is even more likely (now you don't need a collision, you just need n - 1 and n or n and n + 1 to be hit for some n).

Random is fickle.

*: In case you're not familiar with it, it says that if you have twenty-three people in a room, it is more likely than not that at least two people in the room share the same birthday.

!: Okay, I worked it out. It's 0.5953830515549951746819986449....

jason
  • 236,483
  • 35
  • 423
  • 525
13

Research:

If you use the constructor without parameters new Random() the seed is depending on the current servertime.

Random(): "Initializes a new instance of the Random class, using a time-dependent" http://msdn.microsoft.com/en-us/library/system.random.aspx

So, if I try it like this:

for(int i = 0; i < 1000; i++)
{
   Random ran = new Random();
   Console.WriteLine(ran.Next(50001));
}

I only get 3 different numbers about 300 times within a thousand calls! Not that random...

Setting the seed in the constructor new Random(0) returns a fix serie of numbers.

e.g. new Random(0).Next(50) always! returns 36. Try it yourself, if you don't trust me;

What we need for "real" random numbers is a changing seed, that's independent of time.

I'm using Hashcode of changing values:

e.g. Guid.NewGuid().GetHashCode() or DateTime.Now.GetHashCode()


Result:

Random ran = new Random(Guid.NewGuid().GetHashCode());
for(int i = 0; i < 1000; i++)
{       
   Console.WriteLine(ran.Next(50001));
}

or (for better performance):

for(int i = 0; i < 1000; i++)
{
     int val = Guid.NewGuid().GetHashCode() % 50001;
     val = val > 0 ? val : -val;
     Console.WriteLine(val);
}

PS: The maximum of the Next(max)-method is always max - 1;

-> ran.Next(11) can return 0,1,2,...,8,9,10. Not 11!

peterh
  • 11,875
  • 18
  • 85
  • 108
Bahamut
  • 195
  • 1
  • 6
  • 3
    `DateTime.Now.GetHashCode()` is no better than the default seed. – CodesInChaos Sep 12 '12 at 14:03
  • +1 This was "random" enough for my liking :) No clumps of numbers when I tried to generate 2000 of them. – rtpHarry Oct 30 '12 at 16:05
  • @CodesInChaos Your right. I don't know exactly if the default constructor of Random uses the DateTimes hashcode or milliseconds or whatever, but it ain't random at all – Bahamut Jan 10 '14 at 09:25
  • I love it, a very simple and modular solution – Kelly Larsen Apr 16 '14 at 09:51
  • 3
    @Bahamut Do you realize that it wasn't your new seed value that fixed it, it was the fact that you moved the initialization of the `Random` instance out of the loop? Experiments are most effective when only one variable changes between trials. – Jim Buck Jun 26 '15 at 16:38
11

As an explanation of why you're seeing the occasional duplicate, Jason's answer is right on.

If what you want is 300 distinct random numbers, what about something like this?

static IEnumerable<int> GetRandoms(int min, int max)
{
    var rand = new Random();
    while (true)
    {
        yield return rand.Next(min, max);
    }
}

var distinctRandoms = GetRandoms(0, 50000).Distinct().Take(300);
Community
  • 1
  • 1
Dan Tao
  • 125,917
  • 54
  • 300
  • 447
  • Yup it gives distinct number but it still have conjunction numbers, this is due to the Birthday paradox I believe. This seems to be fulfilling what i need to do. Thanks Dan Tao – Terrence Feb 08 '11 at 04:33
  • 2
    Keep in mind that by forcing distinction, you are in fact harming the "Randomness" - the distribution is no longer (pseudo) uniform. – bavaza Oct 21 '11 at 07:47
  • 1
    @bavaza: Not sure if that comment is aimed at me or the OP, but I think the OP does not actually want true *randomness*; otherwise he wouldn't be asking this question. Of course you're correct that eliminating duplicates compromises the randomness of the data, just as sorting does. – Dan Tao Oct 21 '11 at 14:30
  • 2
    @Dan: It was aimed at the OP and other SO junkies out there. I decided to add it because I often found that people tend to abuse statistics. Anyways, thanks for the answer :-). – bavaza Oct 23 '11 at 14:14
  • perfect though for psuedo random lists thanks DanTao – Anonymous Type Jul 14 '12 at 23:47