0

While developing I always have to rewrite the same lambda expression over and over again which is quite redundant and most of the cases the code formatting policy imposed by my company does not help. So I moved these common lambdas to a utility class as static methods and use them as method references. The best example I have is the Throwing merger used in conjunction with java.util.stream.Collectors.toMap(Function, Function, BinaryOperator, Supplier). Always having to write (a,b) -> {throw new IllegalArgumentException("Some message");}; just because I want to use a custom map implementation is a lot of hassle.

//First Form

public static <E> E throwingMerger(E k1, E k2) {
    throw new IllegalArgumentException("Duplicate key " + k1 + " not allowed!");
  }

//Given a list of Car objects with proper getters
Map<String,Car> numberPlateToCar=cars.stream()//
   .collect(toMap(Car::getNumberPlate,identity(),StreamUtils::throwingMerger,LinkedHasMap::new))
//Second Form 

  public static <E> BinaryOperator<E> throwingMerger() {
    return (k1, k2) -> {
      throw new IllegalArgumentException("Duplicate key " + k1 + " not allowed!");
    };
  }
Map<String,Car> numberPlateToCar=cars.stream()//
   .collect(toMap(Car::getNumberPlate,identity(),StreamUtils.throwingMerger(),LinkedHasMap::new))

My questions are the following:

  • Which of the above is the correct approach and why?

  • Does either one of them offer a performance advantage or compromises performance?

2 Answers2

3

Neither variant is more correct than the other.

Further, there is no significant performance difference, as the relevant bytecode is even identical. In either case, there will be a method holding a throw statement in your class and an instance of a runtime generated class which will invoke that method.

Note that you can find both patterns within the JDK itself.

  • Function.identity() and Map.Entry.comparingByKey() are examples of factory methods containing a lambda expression
  • Double::sum, Objects::isNull, or Objects::nonNull are examples of method references to target methods solely existing for the purpose of being referenced that way

Generally, if there are also use cases for invoking the methods directly, it’s preferable to provide them as API methods, which may also be referenced by method references, e.g. Integer::compare, Objects::requireNonNull, or Math::max.

On the other hand, providing a factory method makes the method reference an implementation detail that you can change when there is a reason to do so. E.g., did you know that Comparator.naturalOrder() is not implemented as T::compareTo? Most of the time, you don’t need to know.

Of course, factory methods taking additional parameters can’t be replaced by method references at all; sometimes, you want the parameterless methods of a class to be symmetric to those taking parameters.


There is only a tiny difference in memory consumption. Given the current implementation, every occurrence of, e.g. Objects::isNull, will cause the creation of a runtime class and an instance, which will then be reused for the particular code location. In contrast, the implementation within Function.identity() makes only one code location, hence, one runtime class and instance. See also this answer.

But it must be emphasized that this is specific to a particular implementation, as the strategy is implemented by the JRE, further, we’re talking about a finite, rather small number of code locations and hence, objects.


By the way, these approaches are not contradicting. You could even have both:

// for calling directly
public static <E> E alwaysThrow(E k1, E k2) {
    // by the way, k1 is not the key, see https://stackoverflow.com/a/45210944/2711488
    throw new IllegalArgumentException("Duplicate key " + k1 + " not allowed!");
}
// when needing a shared BinaryOperator
public static <E> BinaryOperator<E> throwingMerger() {
    return ContainingClass::alwaysThrow;
}

Note that there’s another point to consider; the factory method always returns a materialized instance of a particular interface, i.e. BinaryOperator. For methods that need to be bound to different interfaces, depending on the context, you need method references at these places anyway. That’s why you can write

DoubleBinaryOperator sum1 = Double::sum;
BinaryOperator<Double> sum2 = Double::sum;
BiFunction<Integer,Integer,Double> sum3 = Double::sum;

which would not be possible if there was only a factory method returning a DoubleBinaryOperator.

Holger
  • 285,553
  • 42
  • 434
  • 765
1

EDIT: Ignore my remarks about avoiding unnecessary allocations, see Holgers answer as to why.

There won't be a noticable performance difference between the two - the first variant is avoiding unnecessary allocations though. I would prefer the method reference as the function does not capture any value and thus does not need a lambda in this context. Compared to creating the IllegalArgumentException, which has to fill its stacktrace before being thrown(which is quite expensive), the performance difference is totally negligible.

Remember: this is more about readability and communicating what your code does than about performance. If you ever hit a performance wall because of this kind of code lambdas and streams just aren't the way to go as they are a pretty elaborate abstraction with many indirections.

roookeee
  • 1,710
  • 13
  • 24
  • 1
    If method references are better then why in the Java source code the second form is used like:java.util.function.Function.identity() or java.util.stream.Collectors.throwingMerger() – Mate Szilard May 27 '19 at 19:52
  • I did some digging and did not find anything on that topic, great comment - will look for an answer. Everything I have found implies method references are treated as a special form of lamda that does not need to synthesize a method. And method references don't pollute the stack trace. Will get back to you if I find anything else – roookeee May 27 '19 at 21:45
  • 1
    Why do you think, the first variant avoids unnecessary allocations? Which allocations do you think, does the second variant that the first could avoid? – Holger May 28 '19 at 08:32
  • I'll have to second Holger here, _exactly_ what allocations are you talking about? – Eugene May 28 '19 at 08:43
  • I stand corrected as outlined in Holgers answer. I did not think the JVM would instantiate a class for each function reference. Even if it didn't, both variants would allocate exactly one instance which voids my argument that the function reference is allocating less. – roookeee May 28 '19 at 08:56