Optimization of Java Stream API functional interfaces for highly loaded system

Question

We have methods with Java Stream API that are invoked very frequently, e.g. 10'000 - 20'000 times per second (a data streaming system). Let's review the following simple test method (intentionally simplified and doesn't make real value):

public void test() {
        Stream.of(1, 2, 3, 4, 5)
                .map(i -> i * i)
                .filter(new SuperPredicate())
                .sorted(Comparator.comparing(i -> -i + 1,  Comparator.nullsFirst(Comparator.naturalOrder())))
                .forEach(System.out::println);
 }

class SuperPredicate implements Predicate<Integer> {
    public SuperPredicate() {
        System.out.println("SuperPredicate constructor");
    }
    @Override
    public boolean test(Integer i) {
        return i % 3 != 0;
    }
}

On each invocation of test method, new instances of functional interfaces will be created (in our example, SuperPredicate and Comparator.nullsFirst()). So for frequent method invocations, thousands of excess objects will be created. I understand that creation of an object takes few nanoseconds in Java, but still, if we are talking about high load, it might also increase load of GC, and, as a result, influence performance.

As I see, we could move creation of such functional interfaces into private static final variables inside the same class, as they are stateless, it slightly decreases load on the system. It's kind of micro-optimization. Do we need to do this? Does Java compiler / JIT compiler somehow optimize such cases? Or maybe the compiler has some options / optimization flags to improve such cases?

You filter and sort, but is concerned about this? Time to learn to use a profiler! — Thorbjørn Ravn Andersen, May 30 '20 at 18:03
If you have a good performance testing env, enabling JIT compilation logs might be illuminating. (-XX:+UnlockExperimentalVMOptions, -XX:+LogCompilation) — Erik, May 30 '20 at 18:08
It's probably better to save your SuperPredicate object rather than call the constructor each time, but it's such a small optimization that it probably doesn't matter. As Thorbjørn Ravn Andersen said, sorting and filtering will eclipse these tiny costs — user, May 30 '20 at 18:08
If performance really is that important I would look to going entirely primitive, using arrays and int:s — Erik, May 30 '20 at 18:18
I just try to figure out whether I need to care about such optimization or not. we have multiple such methods in our flow. as I understood from @ThorbjørnRavnAndersen sort will be much heavyweight operation — Vasyl Sarzhynskyi, May 30 '20 at 18:22
@VasiliySarzhynskyi Don't trust what I say - measure for yourself. — Thorbjørn Ravn Andersen, May 30 '20 at 18:32
I have copied this code and ran in locally. There was only one instance create of SuperPredicate class. Following was the out put. "SuperPredicate constructor 25 16 4 1" I don't understand what is your problem — janith1024, Jun 01 '20 at 03:53
@janith1024 on each invocation of `test` method, constructors of `SuperPredicate` and `Comparators.NullComparator` will be created for a specified example. concern was for case if we invoke `test` thousands of times per second. — Vasyl Sarzhynskyi, Jun 01 '20 at 05:04
your test method mean not the SuperPredicate.test(), that true. If you want to ignore that instance creation. You can create a privet field in the class the test method define (privet SuperPredicate sp = new SuperPredicate()) and you can use that field value. in your stream like .filter(sp). Then the you mention instance creation can ignore. my concern is that worth to do. — janith1024, Jun 01 '20 at 05:22

score 2 · Accepted Answer · answered Jun 03 '20 at 13:22

You can only store objects in static final fields for reuse, when they don’t depend on variables of the surrounding context, not to speak of potentially changing state.

In that case, there is no reason to create a class like SuperPredicate at all. You can simply use i -> i % 3 != 0 and get the behavior of remembering the first created instance for free. As explained in Does a lambda expression create an object on the heap every time it's executed?, in the reference implementation, the instances created for non-capturing lambda expressions will be remembered and reused.

There is no need for a new comparator either. Letting potential overflows aside, using the function i -> -i + 1 does just reverse the order due to the negation whereas +1 has no effect on the order. Since the result of the expression -i + 1 can never be null, there is no need for Comparator.nullsFirst(Comparator.naturalOrder()). So you can replace the entire comparator with Comparator.reverseOrder(), to the same result but not bearing any object instantiation, as reverseOrder() will return a shared singleton.

As explained in What is the equivalent lambda expression for System.out::println, the method reference System.out::println is capturing the current value of System.out. So the reference implementation does not reuse the instance that is referencing a PrintStream instance. If we change it to i -> System.out.println(i), it will be a non-capturing lambda expression which will re-read System.out on each function evaluation.

So when we use

Stream.of(1, 2, 3, 4, 5)
    .map(i -> i * i)
    .filter(i -> i % 3 != 0)
    .sorted(Comparator.reverseOrder())
    .forEach(i -> System.out.println(i));

instead of your example code, we get the same result, but save four object instantiations, for the predicate, the consumer, the nullsFirst(…) comparator and the comparing(…) comparator.

To estimate the impact of this saving, Stream.of(…) is a varargs method, so a temporary array will be created for the arguments, then, it will return an object representing the stream pipeline. Each intermediate operation creates another temporary object representing the changed state of the stream pipeline. Internally, a Spliterator implementation instance will be used. This make a total of six temporary objects, just for describing the operation.

When the terminal operation starts, a new object representing the operation will be created. Each intermediate operation will be represented by a Consumer implementation having a reference to the next consumer, so the composed consumer can be passed to the Spliterator’s forEachRemaining method. Since sorted is a stateful operation, it will store all elements into an intermediate ArrayList (which makes two objects) first, to sort it before passing them to the next consumer.

This makes a total of twelve objects, as the fixed overhead of the stream pipeline. The operation System.out.println(i) will convert each Integer object to a String object, which consists of two objects, as each String object is a wrapper around an array object. This gives ten additional objects for this specific example, but more important, two objects per element, so using the same stream pipeline for a larger dataset will increase the number of objects created during the operation.

I think, the actual number of temporary objects created before and behind the scenes, renders the saving of four objects irrelevant. If allocation and garbage collection performance ever becomes relevant for your operation, you usually have to focus on the per element costs, rather than the fixed costs of the stream pipeline.

Optimization of Java Stream API functional interfaces for highly loaded system

1 Answers1