20

Consider the following program:

public class Program
{
     private static Random _rnd = new Random();
     private static readonly int ITERATIONS = 5000000;
     private static readonly int RANDOM_MAX = 101;

     public static void Main(string[] args)
     {
          ConcurrentDictionary<int,int> dic = new ConcurrentDictionary<int,int>();

          Parallel.For(0, ITERATIONS, _ => dic.AddOrUpdate(_rnd.Next(1, RANDOM_MAX), 1, (k, v) => v + 1));

          foreach(var kv in dic)
             Console.WriteLine("{0} -> {1:0.00}%", kv.Key, ((double)kv.Value / ITERATIONS) * 100);
     }
}

This will print the following output:

(Note that the output will vary on each execution)

> 1 -> 97,38%
> 2 -> 0,03%
> 3 -> 0,03%
> 4 -> 0,03%
...
> 99 -> 0,03%
> 100 -> 0,03%

Why is the number 1 generated with such frequency?

Matias Cicero
  • 25,439
  • 13
  • 82
  • 154
  • 1
    Is parallel really relevant? Are you sure this doesn't happen without it? – rory.ap Feb 29 '16 at 14:11
  • 12
    `Random` is not thread-safe. – Lee Feb 29 '16 at 14:12
  • @roryap It doesn't happen with a normal for loop. At least on my machine. – Matias Cicero Feb 29 '16 at 14:13
  • `Random` class isn't threadsafe. You shouldn't be using it in multiple threads without synchronization. If you do, you should expect some weirdness. – Sriram Sakthivel Feb 29 '16 at 14:14
  • @MatiasCicero -- ok. you may want to clarify that in your question. – rory.ap Feb 29 '16 at 14:17
  • Have you looked at this: http://blogs.msdn.com/b/pfxteam/archive/2009/02/19/9434171.aspx? Nice article about getting random numbers in thread safe way. – Dovydas Šopa Feb 29 '16 at 14:39
  • Use the .NET Framework [RNGCryptoServiceProvider Class](https://msdn.microsoft.com/en-us/library/system.security.cryptography.rngcryptoserviceprovider(v=vs.110).aspx), it does not need seeding and produces Cryptographic quality random numbers. See the answer by @Jesús López – zaph Feb 29 '16 at 15:50

3 Answers3

21

Random is not thread safe.

Next does nothing special at all to ensure thread safety.

Don't use Random like this. And don't consider using thread local storage duration either, else you will mess up the generator's statistical properties: You must only use one Random instance. One approach will be to use lock(_global) and draw a number in that locked region.

I think what's happening here is that the first thread to reach the generator gets its random numbers generated correctly, and all subsequent threads receive 0 for each drawing. With a "parallelisation" thread pool of 32 threads, the ratios you cite above are approximately attainted; assuming that the results for the 31 threads are placed in the first bucket.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • I see! Should I use a lock or something similar? – Matias Cicero Feb 29 '16 at 14:15
  • I've removed my dv. I think this is half the answer, but it still doesn't answer the base question or help to explain the behavior: why is it producing 1 most of the time? – rory.ap Feb 29 '16 at 14:21
  • Because they're executing on parallel, when a number is requested from the Random generator it gets a number from a precomputed table, if you ask simultaneously for two numbers the generator does not know it should be another and returns the same as the index on the computed table still did not changed. – Gusman Feb 29 '16 at 14:26
  • I would have assumed that one instance of Random would not be the solution either, since two simultaneous calls to next may easily return the same if it is not thread safe as you say. – Neil Feb 29 '16 at 14:36
  • Indeed. You can only call `Next` within the `lock`ed region. – Bathsheba Feb 29 '16 at 14:37
6

Going one step further from the thread local storage solution, and trying to avoid the statistical issue, I suggest to use a random seed generated from RNGCryptoServiceProvider:

using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApplication1
{
    class Program
    {

        private static readonly int ITERATIONS = 5000000;
        private static readonly int RANDOM_MAX = 101;

        private static int GetCriptoRandom()
        {
            using (var rng = new System.Security.Cryptography.RNGCryptoServiceProvider())
            {
                byte[] bytes = new byte[4];
                rng.GetBytes(bytes);
                return BitConverter.ToInt32(bytes, 0);
            }
        }

        private static ThreadLocal<Random> m_rnd = new ThreadLocal<Random>(() => new Random(GetCryptoRandom()));

        private static Random _rnd
        {
            get
            {
                return m_rnd.Value;
            }
        }

        static void Main(string[] args)
        {
            ConcurrentDictionary<int, int> dic = new ConcurrentDictionary<int, int>();
            Parallel.For(1, ITERATIONS, _ => dic.AddOrUpdate(_rnd.Next(1, RANDOM_MAX), 1, (k, v) => v + 1));
            foreach (var kv in dic)
                Console.WriteLine("{0} -> {1:0.00}%", kv.Key, ((double)kv.Value / ITERATIONS) * 100);

        }
    }
}

It seems statistically correct, results range from 0.99% to 1.01%.

Jesús López
  • 8,338
  • 7
  • 40
  • 66
  • This looks promising, but the OP (or you) should run this through the diehard tests for statistical randomness. – Bathsheba Feb 29 '16 at 15:13
  • `System.Security.Cryptography.RNGCryptoServiceProvider` returns `IDisposable` so you have to wrap it into `using`; another issue with the generator is that *cryptographic* random is quite slow. – Dmitry Bychenko Feb 29 '16 at 15:31
  • @DmitryBychenko. Thanks for pointing that out. Edited to enclose `RNGCryptServiceProvider` in `using` block. Although very slow, it is called only once per thread, so the impact on performance is minimal. – Jesús López Feb 29 '16 at 15:37
  • @DmitryBychenko Performance isn't necessarily a problem, since it's only used for the seeds of the other `Random`s. It's quite possible that two `Random`s will get the same seed anyway, though. Depending on how those random values are used, that may be a major problem or a triviality. – Luaan Feb 29 '16 at 15:38
  • @Luann. I wouldn't say it's quite possible. It's very rare actually, the probability for two consecutive seeds be the same is 1/2^32 – Jesús López Feb 29 '16 at 15:41
  • @JesúsLópez The problem with the 1/2^31 probability is that it only applies to any two seeds. Once you look at many seeds, the birthday problem applies and the probability becomes high at 50000 seeds. I don't think there is any sane way of seeding `System.Random`, it should be thrown away entirely. – CodesInChaos Feb 29 '16 at 16:43
  • @DmitryBychenko It's not *that* slow, as long as you ask for several kilobytes at once (Short of Monte Carlo simulations it should rarely be the bottleneck). But of of course that complicates the implementation. – CodesInChaos Feb 29 '16 at 16:45
  • @CodesInChaos I know this is not perfect. But it's not so bad. Why would I need 50000 seeds?. If I have a 32 core machine, I just need 32 seeds on my app, and make them different is not so hard. – Jesús López Feb 29 '16 at 17:22
  • @JesúsLópez If the application doesn't run very long (e.g. a typical console application), or the thread pool stops threads if the application has low load and starts them again when the load increases you could get a lot of seeds over time. – CodesInChaos Feb 29 '16 at 17:28
1

Random is not thread-safe - you must not use the same Random instance from multiple threads without synchronization.

Why are you getting 1 in particular? Well, the way Random works (in 4.5.2) is by keeping a seed array as well as two indexers. When you use it from multiple threads concurrently, your seed array is going to get all messed up, and you'll almost always get identical values in multiple slots. The basic operation does something like seed[a] - seed[b], and when those values are the same, you get zero back. Since you asked for 1 as a minimum, this zero is shifted to one - and there's your anomaly. This happens very quickly in a multi-core environment, since there's quite a lot of inter-dependent state that's updated on each Next call.

There's many ways to solve this problem. One is to synchronize access to a common Random instance - it only makes sense if you're doing relatively few randoms, though, in which case you wouldn't use Parallel anyway. If performance is a concern, you'll either need to add some form of pre-fetching (e.g. preparing the random numbers in batches, per-thread or using some concurrent queue), or use some other method.

Another way is to keep a separate Random instance for each thread. This requires you to carefuly select a seed for each of the instances, though, otherwise your random numbers may end up being quite non-random. The approach used in .NET itself (again, using 4.5.2 code for reference) is to use Thread.CurrentThread.ManagedThreadId as the seed, which works quite well. Another common approach is to use a single global (synchronized) Random instance to initialize the seeds of the other Randoms, but depending on your requirements, you may need to ensure no duplicate seeds are produced.

Of course, you could also use some other random number generator. However, pseudo-random generators will usually require the same approaches as Random - they heavily depend on their state; that's what makes them pseudo-random in the first place. A cryptographic generator may work better, but those tend to be very slow, and may fallback to the synchronized approach anyway, especially if there's no hardware support.

In some cases, it makes sense to distribute the generation work according to some reasonable rules. For example, if you use pseudo-random procedural generation for in-game assets, it may make sense to make explicit rules to how different generators are seeded, in a repeatable manner - of course, this also means you can't really use Parallel either, and you'll have to be a bit more explicit.

Luaan
  • 62,244
  • 7
  • 97
  • 116