361

I have a question regarding the usage of the Function.identity() method.

Imagine the following code:

Arrays.asList("a", "b", "c")
          .stream()
          .map(Function.identity()) // <- This,
          .map(str -> str)          // <- is the same as this.
          .collect(Collectors.toMap(
                       Function.identity(), // <-- And this,
                       str -> str));        // <-- is the same as this.

Is there any reason why you should use Function.identity() instead of str->str (or vice versa). I think that the second option is more readable (a matter of taste of course). But, is there any "real" reason why one should be preferred?

  • 6
    Ultimately, no, this won't make a difference. – fge Jan 19 '15 at 20:17
  • 73
    Either is fine. Go with whichever you think is more readable. (Don't worry, be happy.) – Brian Goetz Jan 19 '15 at 22:29
  • 3
    I would prefer `t -> t` simply because it's more succinct. – David Conrad Jan 20 '15 at 16:33
  • 5
    Slightly unrelated question, but does anyone know why the language designers make identity() return an instance of Function instead of having a parameter of type T and returning it so the method can be used with method references? – Kirill Rakhman Jan 22 '15 at 19:47
  • I would argue there's a use to being conversant with the word "identity," since it has an important meaning in other areas of functional programming. – orbfish Jun 14 '15 at 01:14
  • 1
    As a non native english speaker I kind of think that the name of the function should have been "identical" or am I wrong? If I read `Function.identity` I would guess that I get some value from an attribute which actually "identifies" the object (like a hashCode for example). But to me it seems that the output is just "identical" to the input. Does this make sense? – the hand of NOD Apr 28 '20 at 11:03
  • 10
    The _identity function_ is a well-known mathematical term; we chose to lean on this existing understanding. – Brian Goetz Mar 17 '21 at 19:46
  • @thehandofNOD `hashCode` does *not* identify an object! Very commonly made mistake! The only requirement for a hash code function is that objects that are the same yield the same hash code, *not* that objects that are different yield different values. As such comparing hash codes can never be used as a reliable alternative to comparing objects. – Frans Aug 31 '21 at 08:46
  • @Frans: yes you are absolutely right regarding the `hashCode` method in java, but I meant "hashcode" in a more abstract way and not java specific. Still thanks for pointing it out – the hand of NOD Sep 01 '21 at 14:33
  • @KirillRakhman You mean something like `static T identity(T t) { return t; }`? – MC Emperor Oct 28 '21 at 09:02

3 Answers3

440

As of the current JRE implementation, Function.identity() will always return the same instance while each occurrence of identifier -> identifier will not only create its own instance but even have a distinct implementation class. For more details, see here.

The reason is that the compiler generates a synthetic method holding the trivial body of that lambda expression (in the case of x->x, equivalent to return identifier;) and tell the runtime to create an implementation of the functional interface calling this method. So the runtime sees only different target methods and the current implementation does not analyze the methods to find out whether certain methods are equivalent.

So using Function.identity() instead of x -> x might save some memory but that shouldn’t drive your decision if you really think that x -> x is more readable than Function.identity().

You may also consider that when compiling with debug information enabled, the synthetic method will have a line debug attribute pointing to the source code line(s) holding the lambda expression, therefore you have a chance of finding the source of a particular Function instance while debugging. In contrast, when encountering the instance returned by Function.identity() during debugging an operation, you won’t know who has called that method and passed the instance to the operation.

Community
  • 1
  • 1
Holger
  • 285,553
  • 42
  • 434
  • 765
  • 5
    Nice answer. I have some doubts about debugging. How it can be useful? It's very unlikely to get the exception stack trace involving `x -> x` frame. Do you suggest to set the breakpoint to this lambda? Usually it's not so easy to put the breakpoint into the single-expression lambda (at least in Eclipse)... – Tagir Valeev Aug 19 '15 at 01:57
  • 17
    @Tagir Valeev: you may debug code which receives an arbitrary function and step into the apply method of that function. Then you may end up at the source code of a lambda expression. In the case of an explicit lambda expression you’ll know where the function comes from and have a chance to recognize at which place the decision to pass though an identity function was made. When using `Function.identity()` that information is lost. Then, the call chain may help in simple cases but think of, e.g. multi-threaded evaluation where the original initiator is not in the stack trace… – Holger Aug 19 '15 at 08:35
  • 2
    Interesting in this context: http://blog.codefx.org/java/instances-non-capturing-lambdas/ – Wim Deblauwe Oct 26 '15 at 13:38
  • 13
    @Wim Deblauwe: Interesting, but I would always see it the other way round: if a factory method doesn’t explicitly state in its documentation that it will return a new instance on every invocation, you can’t assume that it will. So it shouldn’t be surprising if it doesn’t. After all, that’s one big reason for using factory methods instead of `new`. `new Foo(…)` guaranties to create a new instance of the exact type `Foo`, whereas, `Foo.getInstance(…)` may return an existing instance of (a subtype of) `Foo`… – Holger Oct 26 '15 at 14:15
  • took a look into the Implementation of `Function.identity()`, which is: `return t -> t;` – dfreis Oct 19 '21 at 08:11
  • 2
    @dfreis [you are not the first one](https://stackoverflow.com/a/28033143/2711488) – Holger Oct 19 '21 at 08:14
120

In your example there is no big difference between str -> str and Function.identity() since internally it is simply t->t.

But sometimes we can't use Function.identity because we can't use a Function. Take a look here:

List<Integer> list = new ArrayList<>();
list.add(1);
list.add(2);

this will compile fine

int[] arrayOK = list.stream().mapToInt(i -> i).toArray();

but if you try to compile

int[] arrayProblem = list.stream().mapToInt(Function.identity()).toArray();

you will get compilation error since mapToInt expects ToIntFunction, which is not related to Function. Also ToIntFunction doesn't have identity() method.

JJ Brown
  • 543
  • 6
  • 13
Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • 4
    See http://stackoverflow.com/q/38034982/14731 for another example where replacing `i -> i` with `Function.identity()` will result in a compiler error. – Gili Jun 26 '16 at 03:35
  • 30
    I prefer `mapToInt(Integer::intValue)`. – shmosel May 01 '17 at 04:52
  • 4
    @shmosel that is OK but it is worth mentioning that both solutions will work similarly since `mapToInt(i -> i)` is simplification of `mapToInt( (Integer i) -> i.intValue())`. Use whichever version you think is clearer, for me `mapToInt(i -> i)` better shows intentions of this code. – Pshemo May 01 '17 at 16:38
  • 1
    I think there can be performance benefits in using method references, but it's mostly just a personal preference. I find it more descriptive, because `i -> i` looks like an identity function, which it isn't in this case. – shmosel May 01 '17 at 18:57
  • @shmosel I can't say much about performance difference so you may be right. But if performance is not an issue I will stay with `i -> i` since my goal is to map Integer to int (which `mapToInt` suggests quite nicely) not to explicitly call `intValue()` method. *How* this mapping will be achieved is not really that important. So lets just agree to disagree but thanks for pointing out possible performance difference, I will need to take a closer look at that someday. – Pshemo May 01 '17 at 21:45
  • Hypothetically `Function.identity()` should be slightly faster due to JIT compilation: The body of the lambda will be compiled sooner due to being called more often (since it's used elsewhere), leading to speedup compared to interpreted mode. But the difference is trivial here. – Vitruvie Jun 01 '18 at 19:17
52

From the JDK source:

static <T> Function<T, T> identity() {
    return t -> t;
}

So, no, as long as it is syntactically correct.

asgs
  • 3,928
  • 6
  • 39
  • 54
JasonN
  • 1,339
  • 1
  • 15
  • 27
  • 9
    I wonder if this invalidates the answer above relating to a lambda creating an object - or if this is a particular implementation. – orbfish Jun 14 '15 at 01:12
  • 37
    @orbfish: that’s perfectly in line. Every occurrence of `t->t` in source code may create one object and the implementation of `Function.identity()` is *one* occurrence. So all call sites invoking `identity()` will share that one object while all sites explicitly using the lambda expression `t->t` will create their own object. The method `Function.identity()` is not special in any way, whenever you create a factory method encapsulating a commonly used lambda expression and call that method instead of repeating the lambda expression, you may save some memory, *given the current implementation*. – Holger Jun 15 '15 at 08:09
  • I'm guessing that this is because the compiler optimizes away the creation of a new `t->t` object each time the method is called and recycles the same one whenever the method is called? – Daniel Gray Aug 08 '17 at 10:37
  • 6
    @DanielGray the decision is made at runtime. The compiler inserts an `invokedynamic` instruction which gets linked on its first execution by executing a so-called bootstrap method, which in the case of lambda expressions is located in the [`LambdaMetafactory`](https://docs.oracle.com/javase/8/docs/api/java/lang/invoke/LambdaMetafactory.html). This implementation decides to return a handle to a constructor, a factory method, or code always returning the same object. It may also decide to return a link to an already existing handle (which currently doesn’t happen). – Holger Aug 30 '19 at 08:16
  • @Holger Are you sure this call to identity wouldn't be inlined then potentially be monomorphized (and inlined again) ? – JasonN Sep 01 '19 at 18:56
  • 2
    @JasonN the method might get inlined, but the semantics do not change. But keep in mind that whether a new instance is created or not, is considered an implementation detail. So we’re discussing a particular implementation here. This particular implementation will re-use the created object, even if the method gets inlined. However, after inlining and applying all sorts of optimizations, the resulting code might not use that object at all. – Holger Sep 02 '19 at 07:51
  • @Holder my point was that if `identity` isn't inlined and the call site was shared, there would probably be a few different obj types coming through there and it couldn't be inlined at all. If it was inlined, then the call site would be lifted to the caller instead of shared and that might only have a single type going through it and can then be monomorphized and inlined again. It would have an enormous performance impact to inline`identity` because of the doors it would open to other optimizations. On these I only care about HotSpot or Graal really. – JasonN Sep 03 '19 at 04:54
  • 2
    @JasonN optimizations are not allowed to change the semantics of the code. So it doesn’t matter whether the method gets inlined. There are two views here. The low-level view sees an `invokedynamic` instruction that is not allowed to get linked more than once and copying it to multiple call-sites is not allowed to change that, so all of them must be linked to the same code that invariably returns the single instance that was created during the bootstrapping. The high-level view knows that the object identity of lambda expressions is unspecified, however, it doesn’t need inlining to know that. – Holger Mar 18 '21 at 13:13