This question may break the APH-record (Answers Per Hour), with 8 answers in the first hour. Usually, this is a bad sign, but surprisingly, I didn't find the question that I would have liked to delcare this one a duplicate of.
Unfortunately, most "simple" answers may have severe drawbacks in practice. Most importantly, the proposed solutions are either inefficient for certain setups, or employ techniques where you can not prove the Total Correctness - namely, the algorithms may not be proven to terminate. This refers to the approaches that boil down to adding random numbers to a Set
and use this Set
to keep track of the number of distinct elements that have been chosen. So, for example, in this code
Set<Integer> set = new HashSet<Integer>();
Random random = new Random(0);
while (set.size() < sampleSize) {
set.add(min + rand.nextInt(max));
}
the loop might never terminate. You simply can not prove that there will ever be 20 distinct numbers chosen. The Random
instance might return 0
in the first call. And it might return 0
in the second call. And in the third call....you can never be sure.
Of course, "in practice", the loop usually will terminate, sooner or later, but this depends on the parameters: When it is requested to pick 20 distinct random numbers between 0 and 10, it will not terminate. The same applies for similar techniques, like
int[] ints = new Random(0).ints(0, 10).distinct().limit(20).toArray();
So the parameters should carefully be checked, to make sure that they are valid in this regard.
The other option that is often suggested in various forms is to use Collections#shuffle
on a list that is pre-filled with the items to choose from. This may be applicable for your case, where this list may have only 100 or 1000 elements. But filling a list, with, say 100000000 elements is too memory consuming, and shuffling this list too time consuming.
There is a rather versatile technique for solving this in general. It is referred to as Reservoir Sampling.
(Note that there are some questions regarding implementations of reservoir sampling, but it did not seem to be proposed as the solution for this very generic task)
Here is an implementation of reservoir sampling in Java. For a given sample size and range, it returns a collection of (random, unique) integers in the desired range, in ascending order:
/**
* Creates a collection with the given size, containing random values
* between the given minimum value (inclusive) and maximum value
* (exclusive). The resulting collection will contain the values
* in ascending order.
*
* @param size The size of the returned collection
* @param min The minimum value (inclusive)
* @param max The maximum value (exclusive)
* @param random The random number generator
* @return The collection
* @throws IllegalArgumentException If the requested size is larger than
* the difference between the maximum value and the minimum value
*/
public static Collection<Integer> randomSample(
int size, int min, int max, Random random)
{
if (size > max - min)
{
throw new IllegalArgumentException(
"Can not create a sample of size "+size+
" with values between "+min+" and "+max);
}
Set<Integer> set = new LinkedHashSet<Integer>(size);
int n = size;
for (int i = 0; i < max - min; i++)
{
double d = (double) size / ((max - min) - i);
if (random.nextDouble() < d)
{
set.add(i + min);
n--;
}
if (n <= 0)
{
break;
}
}
return set;
}
(If there are any flaws or shortcomings with this implementation, drop me a note in the comments).
This can serve as a building block for similar tasks. For example, in your case, you could create a random sample of indices, and use this to pick the desired elements:
int persons[] = new int[1000];
int sample[] = new int[20];
Collection<Integer> indices = randomSample(20, 0, 1000);
int n = 0;
for (Integer index : indices)
{
sample[n] = index;
n++;
}
For other cases, you might want to create a list from the returned indices and shuffle this list. But in this case, only the (small) sample would have to be shuffled, and not the (potentially large) list containing all possible inputs.