4

My problem is the following: I need to shuffle an array and then get just the first N elements.

I am currently shuffling the whole array, that has 50+ elements, but this gives me performance problems, since the shuffling routine is called 1e+9 times.

I am currently implementing the Fisher-Yates algorithm to shuffle:

public static void shuffle(int[] array) {
    Random gen = new Random();
    for (int i = array.length - 1; i > 0; i--) {
        int index = gen.nextInt(i + 1);
        int a = array[index];
        array[index] = array[i];
        array[i] = a;
    }
}

Then I select just the first N elements.

I have also tried using Reservoir sampling, but it just saved me 1 second. That's not enough, since my program runs for 30 secs. Also, I might have implemented it incorrectly, because I am not getting the same results when compared to the Fisher-Yates algorithm. This is my implementation:

public static int[] shuffle(int[] array, int N) {
    int[] ret = new int[N];
    for (int i = 0; i < N; i++) {
        ret[i] = array[i];
    }
    Random gen = new Random();
    int j;
    for (int i = N; i < array.length; i++) {
        j = gen.nextInt(i+1);
        if (j <= N - 1)
            ret[j] = array[i];
    }
    return ret;
}

To conclude, what I need is a a shuffling algorithm that would pick N random elements using a search of length N, instead 50+. If not possible, something better then Fisher-Yates and Reservoir sampling.

Note-1: Altering the original "int[] array" is not a problem.

Note-2: N is usually around 10.

Vamshi
  • 510
  • 2
  • 10
  • 26
jcmonteiro
  • 293
  • 2
  • 16
  • Possible duplicate of http://stackoverflow.com/questions/1519736/random-shuffling-of-an-array – Jayasagar Jan 30 '14 at 15:39
  • I don't think your question is actually about shuffling at all - it looks like what you want is a fast method to pick N items at random from a set of M items where N < M. Shuffling and picking the first N is just one way to accomplish this. – Jackson Jan 30 '14 at 16:01
  • @Jayasagar I think the question is not a duplicate, since my point is not exactly shuffling. Instead, as Jackson pointed out, what I need is to pick N random elements from an array of length M > N. – jcmonteiro Jan 31 '14 at 00:30
  • see [here](http://stackoverflow.com/questions/32035566/efficiently-pick-n-random-elements-from-php-array-without-shuffle/32035986#32035986) for a strictly `O(N)` (both in time and space) algorithm – Nikos M. Aug 26 '15 at 14:19

3 Answers3

3

A simple way to get N shuffled elements from the array is as follows:

  1. Pick a random element r.
  2. Add element r to the output.
  3. Move the last element of the array to the position of r and shrink the array size by 1.
  4. Repeat N times.

In code:

public static int[] shuffle(int[] array, int N) {
    int[] result = new int[N];
    int length = array.length;

    Random gen = new Random();

    for (int i = 0; i < N; i++) {
        int r = gen.nextInt(length);

        result[i] = array[r];

        array[r] = array[length-1];
        length--;
    }

    return result;
}

This algorithm has the advantage over FY that it only computes the first N elements of the shuffled array, rather than shuffling the whole array.

Your optimized algorithm is not optimal for two reasons:

  1. The first N elements are never shuffled. For instance, element 0 can never appear at position 1 in the shuffled array.
  2. You're still doing a lot of work. If N=10 and the total array length is 1000000, you're still computing about 1000000 random values, while you only need 10.
Vincent van der Weele
  • 12,927
  • 1
  • 33
  • 61
  • While I agree this approach would work, I think a good answer to this question should present an explanation of why this is a better approach that the ones suggested by the OP. – Duncan Jones Jan 30 '14 at 15:13
  • 1
    @Duncan I agree. I had some hard time understanding the OP's optimized algorithm. I verified it's correct (except for copying the first `N` elements without shuffling), but it's not efficient. – Vincent van der Weele Jan 30 '14 at 15:25
  • Why are you reinventing the wheel? – hd1 Jan 30 '14 at 15:32
  • @hd1 The question is not to shuffle the array, but to get a random sublist of length `N`. `Collections.shuffle` does not do that. – Vincent van der Weele Jan 30 '14 at 15:58
  • @Heuster Your solution worked perfectly. The idea of placing the element at the end of the array is really good. @hd1 Reinforcing what Heuster said, this is not reinventing the wheel since `Collections.shuffle` does not work, and neither do the usual shuffle methods. – jcmonteiro Jan 31 '14 at 00:26
1

You can modify FY by only doing N iterations, and then taking the last N elements of the array. Alternatively, build the shuffled area at the start of the array, rather than the end, and take the first N elements. However, that will only save iterations of the actual shuffle, and the gain there will be a factor of N/50. That may not be enough.

Depending on the situation, you may be able to generate e.g. a million shuffled decks, and then pick one of them at random for each of the 1e9 outer iterations. That would do one random number generation per outer iteration, plus one thousandth of the current shuffle calls.

Patricia Shanahan
  • 25,849
  • 4
  • 38
  • 75
  • Your solutions seems nice. Although I have no tried neither of them, I think they would work. I have not chosen your answer because the one presented by Heuster fitted nicely in my code. – jcmonteiro Jan 31 '14 at 00:22
-1

Do try not to reinvent the wheel, when Collection.shuffle exists:

public static int[] shuffle(int[] array, int N) { 
    List<Integer> list = new ArrayList<Integer>();
    for (int r : array) {
       list.add(r);
    }
    Collections.shuffle(list);
    Integer[] ret = list.toArray(); // Integer is int, except autoboxed 
    return ret;
}
hd1
  • 33,938
  • 5
  • 80
  • 91