Making Java identify function composition more efficient

Question

Java has java.util.function.Function.identity(T) which return a function equivalent to the lambda expression t -> t (and in fact that is the precise implementation on OpenJDK 17, which I'm looking at that the moment). This means that the returned function is guaranteed to do nothing at all to the input, and merely pass it back, as if there was no function to begin with.

So let's say I compose that function with another one using Function.andThen():

Function<String, String> fn = Function.identity();
Function<String, String> composedFn = fn.andThen(String::toUpperCase);
String foo = composedFn("foo");  //yields "FOO"

This is equivalent to the following, and I would assume (which gets me in trouble sometimes, which is why I'm asking this question) that all of the following would occur in the bytecode—the compiler would not optimize any of the lambda invocations away. (I don't know what the JRE would do at runtime after compiling, though—could it eliminate the Function.identity() altogether?)

String foo = Function.identity().apply("foo").andThen(String::toUpperCase);

Or the equivalent procedural code:

String foo = Function.identity().apply("foo").toUpperCase();

What I don't understand is why we need Function.identity() at all after composition. In other words, what if Function.identity() were implemented like this (somewhat pseudocode, ignoring irrelevant syntax details):

static <T> Function<T, T> identity() {
  return new Function() {

    @Override
    T apply(T t) {
      return t;
    }

    @Override
    <V> Function<T, V> andThen(Function<? super R, ? extends V> after) {
      return after;
    }

  }
}

The point is that if the identity function is guaranteed to act as if it did not exist in the chain), can't function composition andThen() simply return the after function itself, taking the identify function out of the composition chain altogether?

The original code would then be equivalent to the following (pseudocode):

String foo = ((Function<>)(String::toUpperCase)).apply"foo");

Or the equivalent procedural code:

String foo = "foo".toUpperCase();

Wouldn't this be more efficient? Perhaps the gained efficiency would be miniscule, but if it would provide an efficiency gain with no downsides, couldn't we improve Function.identity() in this way?

Please let me know if I'm missing some reason why this won't work, or if the Function.identity() would be 100% optimized away somehow. Otherwise it seems like something I should submit a ticket for to improve the JDK.

Here is why this is useful: there are many situations in which I may want to have optional tranformations be configured to something. By default I could simply set the transformation to null, and then do null checking to see if I wanted to add a transformation. But I'd prefer to avoid nulls altogether. It would be better to default to Function.identity() and allow transformations to be added using andThen() without needing to check for null. If the identity function were improved as I suggest, it seems that I would lose zero efficiency by defaulting to Function.identity() rather than defaulting to null and checking for null every time I add a function composition. Without this improvement, it seems I would be stuck with t -> t in the chain. It's not clear to me whether this is optimized away 100%.

I would be shocked if this actually resulted in any performance improvement in the end. I would expect the JIT to fully optimize this away in actual code. — Louis Wasserman, Jul 10 '23 at 21:01
"I would expect the JIT to fully optimize this away in actual code." @LouisWasserman I guess that is what this question comes down to: is there any remnant of `Function.identity()` in the bytecode (or I suppose more importantly at runtime after the JRE kicks in) after a call to `andThen()`. If something remains of `Function.identity()` that isn't 100% optimized away, it seems useful to me to improve the code, even if it doesn't result in any significant performance improvement. Maybe I've been reading too much about Rust and all the free lunches it provides at the compiler level. — Garret Wilson, Jul 10 '23 at 21:47
"This is equivalent to the following: ... `String foo = Function.identity().apply("foo").andThen(String::toUpperCase);`": no it isn't. `apply()` returns a `String` in this case, and `String` has no `andThen()` function. You are correct that you don't need the initial `Function.identity()` call in your example ... so why did you write it? — user207421, Jul 10 '23 at 23:32
(sigh) @user207421 you are missing the point. By "equivalent" here I merely meant that there remains an invocation to the `Function.identity()` in the chain; it is equivalent to two separate operations, not one as in my modified `andThen()`. And I already explained that I'm creating a chain of composed functions. I can start with a "seed" of `null` or a "seed" of `Function.identity()`. The later will prevent a `null` check for the first "link" in the chain if I use `andThen()`; hence this question. — Garret Wilson, Jul 11 '23 at 00:35
It is highly unlikely it will be optimized away at the bytecode level. After JIT compilation? Maybe. It is liable to be version dependent. But you can get the JVM to show you the JIT-compiled native code and check for yourself. — Stephen C, Jul 11 '23 at 02:15
You haven't answered my objection or my question. My objection: the equivalence you stated is false. My question: if you don't want `Function.identity()`, why did you write it? — user207421, Jul 11 '23 at 09:45
@StephenC that's pretty much what I thought. At the end of the day I think I'll need to take a few hours and decompile what `javac` puts out and see what winds up in the bytecode. I just was curious if someone had done that before and already knew the answer. Looks like that is not the case, and I'll just have to dig into this myself. I'll report back my findings if I get the time to do that. I can also file a JDK ticket. — Garret Wilson, Jul 11 '23 at 17:29

Irremediable · Answer 1 · 2023-07-11T09:40:42.093

Function.identity() is usefull when some method expect to receive mapper Function object but transformation in your use case is not needed.

public class Person {
    private String name;

    public Person(String name) {
      this.name = name;
    }

    public String getName() {
      return name;
    }
  }
  
  ....
  
  public Map<String, Person> getPersonsMap() {
    List<Person> persons = repo.findAll();
    
    return persons.stream()
        .collect(Collectors.toMap(Person::getName, Function.identity()));
  }

Due to nature of lambda functions in Java, each occurance of item -> item will lead to creating implementation class, whereas Function.identity() will not. See more here

Transformations on optional objects available throught Optional class which has methods pretty close to Stream API:

Optional.ofNullable(person.getName())
         .map(name -> name.toUpperCase())
         .orElseThrow(() -> new RuntimeException("Name is null!"));

EDIT: JLS doesn't gurantee any kind of optimization work to be made. Interesting mentions:

8.4.3.2 - An instance method is always invoked with respect to an object, which becomes the current object to which the keywords this and super refer during execution of the method body. (andThen is non-static)

8.4.3.3 - exception checks before optimizing code should be performed. (andThen can throw NullPointerException)

13.4.22 - final method modifier don't prove it can be optimized at runtime

So, it's seems that JIT will try to optimize Function.identity().andThen(String::toUpperCase) to be executed like String::toUpperCase but it is not not "optimized away" regarding ticks of CPU, since lambdas method look up still should be perfomerd together with additional checks of its body.

This is not about `Optional`, and it's not about `item -> item`. The question is whether (embellishing code from your example) in `persons.stream().collect(Collectors.toMap(Person::getName, Function.identity().andThen(String::toUpperCase)))` the `Function.identity()` is 100% optimized away to `persons.stream().collect(Collectors.toMap(Person::getName, String::toUppercase))` or not. Do you know the answer to that question? — Garret Wilson, Jul 10 '23 at 21:50

Making Java identify function composition more efficient

1 Answers1