3

I have found few interesting things (for me at least) while generating random integers and I cannot explain it to myself so I thought I will post it here.

My needs were simple: I was generating random integral (Int32) IDs and was aiming to minimize collisions. The generating time wasn´t an issue.

I have tried these methods for generating random integers:

1.)

   return rnd.Next();

where rnd is class field of type Random with Seed from Method #3.

2.)

   return rnd.Next(Int32.MinValue, Int32.MaxValue);

where rnd is again class field of type Random with Seed from Method #3.

3.)

   var buffer = new byte[sizeof(int)];
   using (var rng = new RNGCryptoServiceProvider())
   {
        rng.GetBytes(buffer);
   }            
   return BitConverter.ToInt32(buffer, 0);

Note: I also tried to have that RNGCryptoServiceProvider as class field initialized once on initialization of the containg class to ease work of GC, but I took same time to generate so I thought this would be "more random".

4.)

   return new Random(Method3()).Next();

5.)

   return new Random(Method3()).Next(Int32.MinValue, Int32.MaxValue);

I know, that creating new Random(int seed) on every call is time consuming, but there is less collision, right?

Well, now the mystery part. I was asuming that most collisions would have the method #1 and #2 where #1 would be slightly faster and more collisionless and least collisions would have the method #4 and #5 where #4 would be slightly faster and more collisionless and method #3 would be some sort of compromise.

So I made a test to prove my asumptions. I generated 10x (to make it average) one million of random numbers using every mentioned method and took the average number of collision and average time it took to generate one million numbers. The results (below) are a bit suprise to me.

The results:duration is hours:minutes:seconds:miliseconds format

Method1: AvgCollisions: 235, AvgDuration: 00:00:00.3561967
Method2: AvgCollisions: 116, AvgDuration: 00:00:00.4042033
Method3: AvgCollisions: 115, AvgDuration: 00:00:04.6037259
Method4: AvgCollisions: 234, AvgDuration: 00:00:09.2195856
Method5: AvgCollisions: 233, AvgDuration: 00:00:09.1788223

I ran the test few times again and it was always circa the same.

Isn't it strange to you, too? The times aren't suprise at all that is what I assumed, but the results mean to me, that method2 is the best to generate random numbers, as it is the most random, the fastest and you can set a minimal and maximal generated number. Don't know how much is the Method2 more predictable over the Method3 as I don't know how would I test it.

Can someone explain me what am I doning wrong or why methods #4 and #5 don't have the least collisions and why is the percentage of collisions always the same? Shouldn't it be random?

EDIT: Here is Visual Studio 2010 Solution of this test I've done: http://bit.ly/nxLODw

AakashM
  • 62,551
  • 17
  • 151
  • 186
MSkuta
  • 1,630
  • 13
  • 19
  • 1
    Are you sure about the number of collisions you see in your first result? Everything else makes sense, in light of [this](http://stackoverflow.com/questions/1654887/random-next-returns-always-the-same-values) (and a zillion other similar discussions here - this is one of the top five programming misconceptions). Can you post your test cases? – Michael Petrotta Oct 14 '11 at 19:33
  • 4
    Your #4/#5 assumption is in error. You should NEVER be calling `new Random()` for each want of a random number. Do this once and only once. – Jesse C. Slicer Oct 14 '11 at 19:34
  • +1 for Michael Petrotta. `and a zillion other similar discussions here` – L.B Oct 14 '11 at 19:44
  • To Michael Petrotta: I posted VS2010 soultion, it is in edit. I know that Random class is defaultly seeded by system clock, thats why I am giving it each time new random seed from RNGCryptoServiceProvider or is it meant that it uses system clock in calculation for every Next() call? – MSkuta Oct 14 '11 at 20:00

2 Answers2

6

The only strange behaviour is with method 5.

In methods 1, 4, you generate a number in the range 0 to int.MaxValue.

In methods 2, 3 and 5, you generate a number in the range int.MinValue to int.MaxValue.

So for methods 2 and 3, you have a range that is about twice as large, and they have about half the collisions compared to methods 1, 4. This seems pretty normal to me.

So why does method 5 produce as many collisions as methods 1 and 4, even though it generates numbers in the larger range? Well, it turns out that the System.Random constructor takes the absolute value of the seed. In other words, it reduces the number of random sequences to half of what it could be. So even though you get numbers from a larger range, you generate them from fewer different sequences.

Jeffrey Sax
  • 10,253
  • 3
  • 29
  • 40
  • You got the point. Don't know how could I miss that 1 and 4 generate only positive numbers.. Now it makes sense to me, thx. – MSkuta Oct 14 '11 at 20:10
  • Now come up a new question, why do all methods have same number of collisions? Shouldnt the method 3 have the least collisions? as it is RNGCrypto... – MSkuta Oct 14 '11 at 20:15
  • Not really. You'll have collisions even in a 'truly' random sequence. Since you're only generating about 1/2000 or 1/4000 of the possible numbers, your denisity is too low to see any difference anyway. – Jeffrey Sax Oct 14 '11 at 20:21
  • Regarding the difference between System.Random and RNGCrypto...: the main difference is in the purpose. System.Random is general-purpose, while RNGCrypto... is specifically designed to defeat reconstruction of the sequence. Many other RNG's you can reconstruct the sequence (and therefore predict the rest of the sequence) from a finite number of values from the sequence. – Jeffrey Sax Oct 14 '11 at 20:23
0

While you're there, you may want to modify #3 as:

        var buffer = new byte[sizeof(int)];

        using (var rng = new RNGCryptoServiceProvider())
        {
            rng.GetBytes(buffer);
        }

        return BitConverter.ToInt32(buffer, 0);

This a) guarantees your array is the size of an int (rather than the magic number 4) and b) properly disposes of the IDisposable that the RNGCryptoServiceProvider is.

Jesse C. Slicer
  • 19,901
  • 3
  • 68
  • 87