0

I need to pick N random elements from an List<MyObject> objects. MyObject is a complex class object. I could easily Collections.shuffle(objects), and get a sublist of size N.

But then a thought came to me. Will it improve performance if I create a List<Integer> indexes = [0, N-1], shuffle this index array instead, get a sublist, and use it to get elements from the original list?

My question boils down to: is there meaningful performance difference in shuffling a list of heavy vs light weight objects in Java?


ps: If it matters, I need to return the random selection in a new list while also removing them from the original list.

F.S.
  • 1,175
  • 2
  • 14
  • 34
  • 2
    Try it and see for yourself, esp.since the answer may depend on the details of your particular heavy object. – Scott Hunter Aug 20 '23 at 20:15
  • Is it OK if the random list has duplicates? – Bohemian Aug 20 '23 at 20:30
  • @Bohemian I'm curious about shuffling behavior in general. For my particular use case, I don't want duplicates in random selection – F.S. Aug 20 '23 at 20:45
  • If duplicates are not allowed, then shuffle + subList is the simplest and in the general case most efficient. If the number of elements is small compared to the original list, direct random selection with retry on duplicate will be probabilistically faster. – Bohemian Aug 20 '23 at 22:34
  • @F.S there is no need to shuffle the entire list. Regarding duplicates, what if there are duplicates in the source list? Note that the time involved should be dependent only on the number items removed, not the size of the source list. For details see my answer below. If I misunderstood your requirement, please let me know. – WJS Aug 21 '23 at 13:22

4 Answers4

9

Type of object irrelevant to shuffling

No difference.

A List is a collection of references (pointers, essentially), not objects. Each element of the list is a reference, think of it as an address in memory where the object lives. That reference implicitly takes you to the object's contents. So we conveniently think of the list as containing objects despite that not quite being accurate.

So shuffling a List<MyObject> and shuffling a List<Integer> do the very same kind of work. Both shuffle a collection of object references, not objects. The objects never moved, they live at the same memory address they always did. References to MyObject objects, and references to Integer objects, are the same, both references.

A reference to a MyObject object is exactly the same kind and size as a reference to Integer object. Their exact nature is an implementation detail of your particular JVM, but you can think of the references as both being a long integer containing the memory address where the object lives.

shuffling a list of heavy vs light weight objects

The size and complexity of the MyObject class and Integer class are irrelevant to shuffling a List.

Here is a diagram depicting a list of object references, before and after shuffling. The size/complexity of the objects on the right, the 3-dimensional boxes, is irrelevant to the moving of the references contained in the list on the left.

diagram of list of references before and after shuffle

Producing a new list

You said:

I need to return the random selection in a new list while also removing them from the original list.

First, producing a new shuffled list.

List < String > names = new ArrayList <> ( List.of ( "Alice" , "Bob" , "Carol" , "Davis" , "Edith" , "Frank" , "Georgette" ) );
Collections.shuffle ( names );

List < String > fewNames = List.copyOf ( names.subList ( 0 , 3 ) );

names = [Bob, Edith, Georgette, Davis, Carol, Alice, Frank]

fewNames = [Bob, Edith, Georgette]

Secondly, effectively remove the excluded names in the original list by merely replacing the reference.

names = fewNames ;  // Effectively removes the excluded elements.

If the first, longer list is not held by any other references, that list object becomes a candidate for garbage collection, and eventually will be removed from memory.

If you really want to eliminate elements from the first longer list itself, retain the subset of elements.

boolean listChanged = names.retainAll( fewNames ) ;

See also the many existing Questions and Answers.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • Thank you for the excellent illustration. For my learning purpose, how do one check whether an operation is by reference or by value? I tried to look up the javadoc which doesn't help https://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#shuffle-java.util.List- – F.S. Aug 20 '23 at 21:50
  • 2
    You will need to learn that everything in Java, without exception, passes references by value. That isn't pass by reference. – Louis Wasserman Aug 20 '23 at 21:56
  • Besides, to clarify, I need to get the new list `fewNames`, while also keeping the original list (minus the removed elements). – F.S. Aug 20 '23 at 21:58
  • @F.S. [*Is Java "pass-by-reference" or "pass-by-value"?*](https://stackoverflow.com/q/40480/642706). Hint: the value of a reference variable is that pseudo memory address I spoke of in this Answer. We never get to see or access this actual value in Java, for security reasons, and to avoid the risky memory-math tricks cleverly preformed in C programming. In this code: `String name = "Alice" ;`, the value of the reference variable `name` is the pseudo memory address where that `String` object lives ; the value of that object is the five characters A-l-i-c-e. – Basil Bourque Aug 21 '23 at 02:49
0

Assuming N is very small compared to the size of the list, but at most the same size, you can combine the generation of random indexes to remove and the removal of objects from the original list in one step without creating an additional intermediate list.

I think this is better than shuffling the original list especially if it has a lot of entries and in comparison N is significantly smaller. Example using Strings:

int N = 3;
List<String> list = new ArrayList<>(List.of("foo", "bar", "baz", "doo", "xyz"));

System.out.println("original list:\n" + list);

Random random = new Random();
List<String> removed = random.ints(0, list.size())
                             .distinct()
                             .limit(N)
                             .boxed()
                             .sorted(Comparator.reverseOrder())
                             .mapToInt(Integer::intValue)
                             .mapToObj(list::remove)
                             .toList();

System.out.println("removed list:\n" + removed);
System.out.println("list after removing N random elements:\n" + list);
Eritrean
  • 15,851
  • 3
  • 22
  • 28
  • Yeah you are right, I think this would work with a large list (Size M). In my case, M and N are both ~100. – F.S. Aug 20 '23 at 22:06
0

First, as so eloquently answered by Basil Bourque, shuffling doesn't depend on the complexity of an object or even a primitive. Objects are stored as references which are in fact ints which "point" or "refer" to their location in memory. Since shuffling is simply moving objects around, you are simply moving their references.

But here is a relatively efficient way to do what you want.

This works equally well for all list sizes and types, is very efficient, and requires no shuffling of the entire list. The overall efficiency is based on the number of items to be removed. Duplicates will never appear unless there are duplicates in the source list.

It has the advantages of using what I refer to as a "partial shuffle" to reposition the last element to the index of the one just selected and then removes the last element (per your stated requirement).

Generate the source list

List<Integer> listInts = new ArrayList<>(IntStream
                .rangeClosed(1, 30).boxed().toList());

Now the algorithm

  • iterate thru the removal count
  • generate a random number within the list's range.
  • add the selected item to the removed list
  • copy the last item in the list to the selected item's spot.
  • then remove the last item (this is very fast since no internal repositioning of the list is required).
public static <T>  List<T> getRandomSelections(List<T> sourceList, int sizeOfSublist) {
    List<T> removed = new ArrayList<>();
    Random r = new Random();
    int size = sourceList.size();
    for (int i = 0; i < sizeOfSublist; i++) {
        int selection = r.nextInt(0, size);
        removed.add(sourceList.get(selection));
        sourceList.set(selection, sourceList.get(--size));
        sourceList.remove(size);
    }
    return removed;
}

Demo

List<Integer> removedInts = getRandomSelections(listInts, 10);
System.out.println("removedInts:   " + removed);
System.out.println("remainingInts: "+ list);

Now the removed list contains the desired items and the original list contains the remaining items.

Here is one using words.

List<String> listWords = new ArrayList<>(List.of("to", "be", "or", "not",
        "to", "be", "that", "is", "the", "question"));
List<String> removedWords = getRandomSelections(listWords, 5);

System.out.println("removedWords:   " +removedWords);
System.out.println("remainingWords: " + listWords);

prints something like

removedInts:   [19, 16, 26, 1, 12, 15, 5, 13, 21, 9]
remainingInts: [27, 2, 3, 4, 24, 6, 7, 8, 22, 10, 11, 28, 23, 14, 25, 29, 17, 18, 30, 20]
removedWords:   [question, or, is, to, to]
remainingWords: [that, be, the, not, be]

WJS
  • 36,363
  • 4
  • 24
  • 39
0

Generating a list of indices will be much faster than first loading a list of objects.

When you are creating the list of objects, you can skip over the indices that weren't randomly selected.

Here is an example with 10 values, from a size of 100.
The while loop is used to generate non-duplicate values.

Random random = new Random();
List<Integer> list = new ArrayList<>();
for (int count = 0, value; count < 10; count++) {
    while (list.contains(value = random.nextInt(100)));
    list.add(value);
}

Output

[26, 22, 46, 51, 64, 78, 8, 83, 24, 23]

"... is there meaningful performance difference in shuffling a list of heavy vs light weight objects in Java? ..."

For shuffling, it just changes the indices of the objects, so that wouldn't impact the performance.
Here is the OpenJDK source for Collections#swap, which is used by Collections#shuffle.

public static void swap(List<?> list, int i, int j) {
    // instead of using a raw type here, it's possible to capture
    // the wildcard but it will require a call to a supplementary
    // private method
    final List l = list;
    l.set(i, l.set(j, l.get(i)));
}
Reilas
  • 3,297
  • 2
  • 4
  • 17