2

I would like pick set of random values (let's say 2 values) from a generic type array. I managed to implemented it but some reason it sometimes return array with duplicated, for examples: [20.33, 20.33], or [32, 32] or ["Rugby", "Rugby"]. Any idea how to solve this issue?

Note that I'm getting this array from a json object thus using Newtonsoft.Json JArray.

var arr = [ 0.4, 20.33, 76.01, 47.3, 23.78];
//or
var arr = [ 32, 68, 89, 27, 93];
// or 
var arr = [ "Football", "Rugby", "Cricket", "Tennis", "Basketball"];


var count = 2;
var item = new JArray();

foreach (var m in Enumerable.Range(0, count).Select(i => arr[new Random().Next(arr.Count())]))
{
  if (!item.Contains(m))
  {
     item.Add(m);
  }
}
Ali
  • 33
  • 1
  • 9
  • 2
    The "best" approach to this depends on how many values are in the array and how many you want to pick. If it's only a few, then you can just make a copy of the array and shuffle it (or equivalent) and take the first N values from it. If there are very many items in the array and you only want a few of them, there are better ways. – Matthew Watson Feb 06 '20 at 10:22
  • 3
    Also note that you are initializing Random for every iteration. This might increase the chance of having duplicates as Random is initialized using identical seed values based on System Clock. A better approach would be to initialize Random once and reuse the instance – Anu Viswan Feb 06 '20 at 10:34
  • 2
    @AnuViswan That is indeed true for .Net Framework - but note that that particular problem has been fixed for .Net Core 3.x. But even if using .Net Core, I agree that the OP should be creating the Random once outside the loop! – Matthew Watson Feb 06 '20 at 10:35
  • 1
    Side note: In case of `asp.net` we should be aware of *thread safety* (`Random` is *not* thread safe, that's why naive `static Random s_Random = new Random();` is not a way out). https://devblogs.microsoft.com/pfxteam/getting-random-numbers-in-a-thread-safe-way/ – Dmitry Bychenko Feb 06 '20 at 10:48
  • 1
    Aside from thread safety issues, there's a fundamental flaw in the loop. It only iterates `count` times, but if one of the items is rejected because it's already in the output array, then the loop will terminate before the correct number of items has been added to it. – Matthew Watson Feb 06 '20 at 11:20
  • just create the Random object outside of the foreach and reuse it for every call. If you dont then you will get duplicates. Random uses a time based seed which means two can have the same seed. So your best bet is to create a SINGLE instance and call Next on that instance many times rather than creating many instances.. – Jonathan Alfaro Feb 06 '20 at 14:23
  • Please check my answer for a thorough explanation of the problem and how to solve it. I have also include Microsoft's documentation on the issue. – Jonathan Alfaro Feb 09 '20 at 15:28

3 Answers3

0

According to the implementation of the JArray class (see. source code) its method Contains uses reference equality to check if specified item is contained it the array. Therefore this code

if (!item.Contains(m))
{
    item.Add(m);
}

allows duplicates to be added into the item collection. Try to use List<object> instead of JArray. Also (as others menthioned in the comments) you should create Random object outside of the linq expression to generate different numbers. Try to rewrite your code like this:

var count = 2;
var temp = new List<object>();
// Will generate random numbers.
var random = new Random();

foreach (var m in Enumerable.Range(0, count).Select(i => arr[random.Next(arr.Count())]))
{
    // Ensures that duplicates will not be added.
    if (!temp.Contains(m))
    {
        temp.Add(m);
    }
}

// Add generated items into the JArray and then use it.
var item = new JArray(temp);
Iliar Turdushev
  • 4,935
  • 1
  • 10
  • 23
0

There seems to be some confusion in this post about how the Random class works and about the lock statement.

I hope to clear the waters a little with this answer.

First let's replicate the problem:

private void Button_Click(object sender, RoutedEventArgs e)
{
  var result = new StringBuilder(1000);

  for(int i = 0; i < 10; i++)
  {
    //we are creating a new instance 
    //every single iteration
    //this means all these instances
    //have the same 'seed'
    var rand = new Random();

    result.AppendLine(rand.Next().ToString());
  }

  //produces a lot of duplicates
  MessageBox.Show(result.ToString());
}

OUTPUT:

1467722682
1467722682
1467722682
1467722682
1467722682
1467722682
1467722682
1467722682
1467722682
1467722682

The code above produces a lot of duplicates! A lot of the instances have the same "seed".

Now let's test the claims made in some of the other answers about "fixing" the problem by using a "lock" statement:

private void Button_Click(object sender, RoutedEventArgs e)
{
  var result = new StringBuilder(1000);

  for(int i = 0; i < 10; i++)
  {
    //we are creating a new instance 
    //every single iteration
    //this means all these instances
    //have the same 'seed'

    //now using a lock
    lock(this)
    {
      var rand = new Random();
      result.AppendLine(rand.Next().ToString());
    }        
  }

  //produces a lot of duplicates
  //locking did NOTHING to fix the problem.
  MessageBox.Show(result.ToString());
}

OUTPUT:

2013405475
2013405475
2013405475
2013405475
2013405475
2013405475
2013405475
2013405475
2013405475
2013405475

As you can see locking did nothing to solve the problem!

Now let's see how the problem is actually solved:

private void Button_Click(object sender, RoutedEventArgs e)
{
  var result = new StringBuilder(1000);

  //we are creating a new instance 
  //outside of the for loop     
  var rand = new Random();

  for(int i = 0; i < 10; i++)
  {                
    result.AppendLine(rand.Next().ToString());
  }

  //no duplicates
  //the problem is now fixed
  MessageBox.Show(result.ToString());
}

OUTPUT:

855346142
1488935613
141810032
1396703820
703238132
978249590
138102129
2143944359
224694938
1542730216

As you can see now we have no duplicates.

The actual problem is that the Random class uses the system's clock to create a seed and you create multiple instances very fast they all have the same seed.

Let's look at the documentation of the Random class:

The default seed value is derived from the system clock, which has finite resolution

That is exactly the problem but let's see from the same documentation what Microsoft says about it:

Random objects that are created in close succession by a call to the parameterless constructor will have identical default seed values and, therefore, will produce identical sets of random numbers

Yes that is the exact description of what is happening...

So what does Microsoft say about how to solve this problem:

This problem can be avoided by using a single Random object to generate all random numbers

That is exactly what we did when we moved the instantiation outside of the for loop.

One last note about this:

Note that this limitation does not apply to .NET Core.

Well according to Microsoft this problem no longer happens if you use .NET Core. It only affects the .NET Framework!

UPDATE:

According to @Aristos the code would behave differently if it were contained withing a classic ASP.NET application.

I just tried the exact same examples in a ASP.NET Application and got the exact same results.

As I expected this issue has nothing to do with ASP.NET.

Jonathan Alfaro
  • 4,013
  • 3
  • 29
  • 32
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/207512/discussion-on-answer-by-jonathan-alfaro-how-to-pick-set-number-of-random-values). – Bhargav Rao Feb 10 '20 at 02:50
-1

Because of the way of the Random implementation, you must be sure that you call the random with 'lock'. In a multi-user environment its possible to call the random at the same time and the inside static values or initialize values of random to be the same.

For example when the random starts its use the time for initialize - there if you call it at the same time you get the same results. Here is the initialize of the random - notice the TickCount that is used and base on that number the generator gives results. With the same initialize all the sequence of numbers later is the same.

[__DynamicallyInvokable]
public Random() : this(Environment.TickCount)
{
}

How to solve this. Use a static random reference that you create it with the born of a static class, and also use lock each time you get the next value.

So your code will be

var arr = [ 0.4, 20.33, 76.01, 47.3, 23.78];
//or
var arr = [ 32, 68, 89, 27, 93];
// or 
var arr = [ "Football", "Rugby", "Cricket", "Tennis", "Basketball"];


var count = 2;
var item = new JArray();

foreach (var m in Enumerable.Range(0, count).Select(i => arr[GlobalFunctions.RandomInt(arr.Count())]))
{
  if (!item.Contains(m))
  {
     item.Add(m);
  }
}

Base on this object

public static class GlobalFunctions
{
    static Random random = null;
    private static readonly object syncLock = new object();

    static GlobalFunctions()
    {
        lock (syncLock)
        {
            if (random == null)
                random = new Random();
        }
    }   

    public static int RandomInt(int max)
    {
        lock (syncLock)
        {
            return random.Next(max);
        }
    } 
}

Why we need to lock also the Next - because if you see the internal call of the Next you end up using some internals values of the object - so there you also need to synchronize each call to avoid duplicates - notice the this.inext, this.inextp - See the inside of the random.next :

[__DynamicallyInvokable]
public virtual int Next()
{
    return this.InternalSample();
}
private int InternalSample()
{
    int inext = this.inext;
    int inextp = this.inextp;
    if (++inext >= 0x38)
    {
        inext = 1;
    }
    if (++inextp >= 0x38)
    {
        inextp = 1;
    }
    int num = this.SeedArray[inext] - this.SeedArray[inextp];
    if (num == 0x7fffffff)
    {
        num--;
    }
    if (num < 0)
    {
        num += 0x7fffffff;
    }
    this.SeedArray[inext] = num;
    this.inext = inext;
    this.inextp = inextp;
    return num;
}

Bottom line - use the lock for synchronize the Random object.

Prove with sample code

Well its difficult some times to recognize this issue because you must have a lot of users to hit your site, and even then you may never see that they see the same results.

Now, to prove that I create many users simulation with iframe on one page, that call one other page that call the generation. The results speaks for them self - you can also download the sample code to check it by your self.

with the other suggestions - we may have duplicates in the case that two or more uses call at the same time the same page/handler/function Dublicates

now with the lock and the static random we have no duplicates no dublicates

And the code to test it http://planethost.gr/so/RandomTestCode.rar

Aristos
  • 66,005
  • 16
  • 114
  • 150
  • You only need to lock on syncRoot if you are using multiple threads which is NOT the case. As of matter of fact using a "lock" on single threaded code is useless and wrong. Why would you want to lock single threaded code? – Jonathan Alfaro Feb 07 '20 at 02:18
  • The OP is not even using static fields. And also you seem to not understand the use of "lock". The use of lock has NOTHING to do with "static" fields. A lock is required to create a syncronization context for a shared object between threads even if there are no static fields. So a lock statement ONLY makes sense in multithreaded code when there is an object shared accross the different threads.... It has NOTHING to do with "multi user" environment. – Jonathan Alfaro Feb 07 '20 at 02:22
  • @JonathanAlfaro its asp.net (multi user call, multiple threads by default because they can call by many users at the same time) - and the OP's have duplicate - – Aristos Feb 07 '20 at 08:14
  • Users have NOTHING to do with multiple threads.... In ASP.NET the requests are handled by multiple threads regardless of the user.... But his code is NOT multi-threaded because it is not using multiple threads... The ONLY reason you would EVER use lock is if the object being accessed is SHARED accross multiple threads... for that to happen the SCOPE of the object has to RESIDE OUTSIDE of the method... for example at the class level... In this case the instance of the Random class he is using is NOT being shared... You really need to read about lock and threads syncronization. – Jonathan Alfaro Feb 07 '20 at 13:58
  • By the way NOT all ASP.NET websites have "users"... You can create a website with no Membership at and it will still be multi threaded... On the other hand if it is ASP.NET Core it will NOT necessarily be multi-threaded. – Jonathan Alfaro Feb 07 '20 at 13:59
  • I think you are very confused about how lock works.... All that lock does is call Monitor.Enter and Monitor.Exit. This just serializes access to a particular block of code so that calls to that code are sequential in nature. This effectively shields an object from being modified by two threads at once. So if the object has not state or if the object is scoped to the method then in that case lock is not only useless it is actually wrong. – Jonathan Alfaro Feb 07 '20 at 14:01
  • 1
    The OP has duplicates because he is creating a new instance of Random on every interation of the loop this causes that some of the instances have the same Seed. In this line of code that YOU posted "public Random() : this(Environment.TickCount)" TickCount has a resolution of about 16 ms this means that if you create two instances of Random one right after the other one within 16 ms BOTH instances will return the EXACT same sequence because they both have the same seed. This CANNOT be solved by using lock... because the instances are already being created sequentially NOT concurrently. – Jonathan Alfaro Feb 07 '20 at 14:04
  • Dont believe me? Write a for loop and create instance of Random INSIDE the for loop and output the numbers to the console. Even adding a lock wont prevent the duplicates..... Now grab the instantiation of the Random class and move it OUT of the for loop and try again.... you wont get duplicates anymore... Try it your self! – Jonathan Alfaro Feb 07 '20 at 14:20
  • This is not "computer religion" it is computer SCIENCE.... It is very exact. Try it! using lock inside a for loop does NOT work.... You will still get duplicates. On the other hand moving the instantiation outside of the for loop works.... And lock is only syntactical sugar for Monitor.Enter and Monitor.Exit..... just read the documentation – Jonathan Alfaro Feb 08 '20 at 01:22
  • @JonathanAlfaro in the middle time read this answer that disagree with you https://stackoverflow.com/a/768001/159270 – Aristos Feb 08 '20 at 08:37
  • That answer just proves my point.... The reason he uses lock is because he is using a STATIC single instance..... In the case of the OP he is NOT using a static singlenton..... It is very easy... Just try to create a new random inside a for loop and use a lock... you will see it does not workl... I am not bullying you. I understand your code perfectly... The OP does NOT want to use a SINGLETON that would a performance problem for an ASP.NET website... He does NOT need globally random numbers. – Jonathan Alfaro Feb 08 '20 at 16:19
  • Once again lock is ONLY needed when the instance is being shared by multiple threads.... Otherwise is useless and performance bottle neck... The OP is NOT using threads... – Jonathan Alfaro Feb 08 '20 at 16:20
  • Also here is an excerpt from the answer you just showed: "Thus there are two valid approaches: 1-synchronize so that we don't access it at the same time from different threads 2-use different Random instances per thread" – Jonathan Alfaro Feb 08 '20 at 16:21
  • See number 2? use different Random instances per thread....?? Yeah that is the correct answer here because number 1 ONLY works when the instance is being shared...... – Jonathan Alfaro Feb 08 '20 at 16:22
  • You DONT need to syncronize the random initialization because the instance's scope is within the method.... You only need it to be static if you need GLOBAL random numbers.... Which clearly the OP does not need. Look at the other answer from LLiar Turdushev that is the correct answer. Creating a static field just makes things worse to the point where you need to "lock" which also means your code is now SLOW. Locking ins an expensive operation. As long as you create the instance inside the method but outside the foreach you do not need to lock. – Jonathan Alfaro Feb 09 '20 at 14:49
  • @JonathanAlfaro I have add a sample code as you ask for - and prove my point... download to check it by your self... – Aristos Feb 11 '20 at 09:24
  • No duplicates..... Each user has no duplicates... what you call "duplicates" is between users.... but each user on it's own has not duplicates... – Jonathan Alfaro Feb 11 '20 at 14:00
  • You are once again confused. The OP that asked the question does not want globally unique random numbers.. He just wants them inside his method. And for achieving globally randomnes is better to just use cryptographically random numbers which is a totally different topic – Jonathan Alfaro Feb 11 '20 at 14:02