1

I've got a List of Objects, this list could potentially contain thousands of elements.

I want to obtain 10, 20, 34, 56 (whatever size portion the user chooses) subset of those, this subset must be randomly selected and I can't have duplicates.

Would Collections.shuffle() be sufficient for large Lists of POJOs? Or is there a more efficient/safer way of doing this?

Taking my example here, if myStrings had 50,000 Strings in it, would calling Collections.shuffle() be a good idea if you only wanted 5 items?

public class ShuffleMe
{

    public static void main(String[] args)
    {
        int NUM_OF_ELEMENTS_TO_PICK = 3;
        List<String> myStrings = new ArrayList<String>();

        myStrings.add("A");
        myStrings.add("B");
        myStrings.add("C");
        myStrings.add("D");
        myStrings.add("E");
        myStrings.add("F");

        Collections.shuffle(myStrings);

        for (int i = 0; i < NUM_OF_ELEMENTS_TO_PICK; i++)
        {
            System.out.println(myStrings.get(i));
        }
    }
}
Jimmy
  • 16,123
  • 39
  • 133
  • 213
  • 3
    possible duplicate of [best way to pick a random subset from a collection?](http://stackoverflow.com/questions/136474/best-way-to-pick-a-random-subset-from-a-collection) – Marcelo Aug 05 '11 at 20:33
  • Can you clarify, is the user picking the size of a single subset, or the number of subsets to generate? If the latter, what size should the subsets be? – Kevin K Aug 05 '11 at 20:34
  • The reason why I don't want to use `Collections.sort()` is because I don't want to have to retrieve all items from a database call if I'll only actually use a few of them, does this alter any answers? – Jimmy Aug 05 '11 at 20:48

3 Answers3

4

Shuffling the whole list would be a bit of a waste of resources if what you want is significantly smaller. I'd personally just pick n unique random numbers between 0..size and use the objects at those indexes for the randomised subset.

If you're talking about picking a random subset very close to the size of the whole collection, then chances are you're better of just calling Collections.shuffle() and choosing the first n entries. But if we're talking about ~5 / 50,000, definitely use the above approach.

Michael Berry
  • 70,193
  • 21
  • 157
  • 216
4

If the number of items you want is much less than the size of the collection, just draw them at random:

Set<Integer> randSubSet = new HashSet<Integer>();
while(randSubSet.size() < NUM_OF_ELEMENTS_TO_PICK) {
    randSubSet.add((int)(Math.random()*myStrings.size()));
}
for (int i : randSubSet) {
    System.out.println(myStrings.get(i));
}
Atreys
  • 3,741
  • 1
  • 17
  • 27
1

Use a Fisher-Yates shuffle, but only run it far enough to pick the number of elements you need.

rossum
  • 15,344
  • 1
  • 24
  • 38