I was benchmarking the performance of streams that collect into an ArrayList
of predefined capacity versus one with the default capacity.
First I add a number of random integers to a master
list, then take the stream of that list and collect it into an ArrayList
. The first stream collects into an ArrayList
without a predefined capacity (so its initial capacity is 10). The other one does.
Code:
int capacity = 8000;
var master = new ArrayList<Integer>(capacity);
for (int i = 0; i < capacity; i++) {
master.add(rnd.nextInt());
}
print("Adding elements to list with default capacity...");
var sw = Stopwatch.startNew();
List<Integer> defaultList =
master
.stream()
.collect(Collectors.toList());
sw.stop();
String elapsedDefault = sw.toString();
print("Adding elements to list with predefined capacity...");
sw.restart();
List<Integer> predefined =
master
.stream()
.collect(Collectors.toCollection(()
-> new ArrayList<>(master.size())));
sw.stop();
String elapsedPredefined = sw.toString();
System.out.println("Time taken with default size: " + elapsedDefault);
System.out.println("Time taken with predefined size: " + elapsedPredefined);
This outputs:
Adding elements to list with default capacity...
Adding elements to list with predefined capacity...
Time taken with default size: 12.63ms
Time taken with predefined size: 3.66ms
However, if I flip the order of which stream comes first in the code, I get something like the below:
Adding elements to list with predefined capacity...
Adding elements to list with default capacity...
Time taken with predefined size: 13.17ms
Time taken with default size: 3.23ms
Does this have anything to do with compiler optimization or CPU operation? I can't think of another reason why this would happen. (Note: Insertion into predefined-capacity list only strictly gives better performance if the number of elements to add is VERY large, on the order of a few hundred million. Otherwise, it doesn't seem to matter outside of whether the statement comes first in the program.)