24

I'm studying the factory methods for Immutable collections. I see the Set.of() method has 10 varargs overloading (same for Map.of()). I really can't understand why there are so many. In the end the function ImmutableCollections.SetN<>(elements) gets called anyway.

In the documentation I found this:

While this introduces some clutter in the API, it avoids array allocation, initialization, and garbage collection overhead that is incurred by varargs calls.

Is the clutter indeed worth the performance gain? If yes, would that ideally do create a separate method for any N elements?

Naman
  • 27,789
  • 26
  • 218
  • 353
Schidu Luca
  • 3,897
  • 1
  • 12
  • 27
  • 1
    I guess they're expecting some really heavy usage for those methods. Whoa. – Kayaman Aug 02 '17 at 07:37
  • You can't really create a method for "any `N` elements", as `N` might be a very large number :) – C-Otto Aug 02 '17 at 07:44
  • @C-Otto, of course I'm aware of it, I just asked if that would help with performance :) – Schidu Luca Aug 02 '17 at 07:46
  • 1
    Yes, and you already cited the reasons. – C-Otto Aug 02 '17 at 07:47
  • @GhostCat That's because Sotirios was on-line at the time, and he's quick on the dupe-hammer. – Stuart Marks Aug 10 '17 at 07:02
  • Not much to add to the answers already here. But these collections are heavily used in JDK 9 during startup, much of which runs in interpretive mode, so saving cycles is important. The current implementation does some redundant copying, but that can be optimized in future releases without affecting compatibility. – Stuart Marks Aug 10 '17 at 07:04

3 Answers3

11

At the moment that method is called anyway - this could change. For example it could be that it creates a Set with only three elements, 4 and so on.

Also not all of them delegate to SetN - the ones that have zero, one and two elements have actual classes of ImmutableCollections.Set0, ImmutableCollections.Set1 and ImmutableCollections.Set2

Or you can read the actual question regarding this ... here Read the comments from Stuart Marks in that question -as he is the person that created these Collections.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • 3
    Also, it should be noted that the varargs method needs to create a defensive copy of the array to guaranty immutability, so even if one of the 3-10 arity overloads ends up in using `SetN` with an array, it’s construction could be cheaper. And having the array creation code at a central place instead of each caller reduces the overall code size. – Holger Aug 02 '17 at 08:26
  • 1
    There’s both `List0` and `Set0`, but yes, `Set0` fits better into the list of `Set1` and `Set2`… – Holger Aug 02 '17 at 08:45
  • @Holger about the second argument - are you saying that it potentially could be cheaper to create an array copy than to populate an Object of 3 and more fields? – Eugene Aug 02 '17 at 08:52
  • 4
    No, I’m saying if application code calls the varargs method, the application creates the array and the jre code has to create a defensive copy for `ListN` (for `SetN`, a new array will be populated anyway). But if the application calls a ten arg method, it doesn’t create an array and *only* the jre code creates an array, which doesn’t need a copy. So even if there is no `Set10`, calling the ten arg method could enable a faster creation of the `ListN`/`SetN` than the varargs method. That opportunity is not used yet. – Holger Aug 02 '17 at 08:57
10

Some aspects of this may be a form of future proofing.

If you evolve an API, you need to pay attention to how the method signature will change, so if we have

public class API {
  public static final <T> Set<T> of(T... elements) { ... }
}

We could say that the varargs is good enough... except that the varargs forces the allocation of an object array, which - while reasonably cheap - does actually impact the performance. See for example this microbenchmark which shows a 50% loss of throughput for a no-op logging (i.e. the log level is lower than loggable) when switching to the varargs form.

Ok, so we do some analysis and say the most common case is the singleton, so we decide to refactor...

public class API {
  public static final <T> Set<T> of(T first) { ... }
  public static final <T> Set<T> of(T first, T... others) { ... }
}

Ooops... that's not binary compatible... it's source compatible, but not binary compatible... to retain binary compatibility we need to keep the previous signature, e.g.

public class API {
  public static final <T> Set<T> of(T first) { ... }
  @Deprecated public static final <T> Set<T> of(T... elements) { ... }
  public static final <T> Set<T> of(T first, T... others) { ... }
}

Ugh... IDE code complete is a mess now... plus how do I create a set of arrays? (probably more relevant if I was using a list) API.of(new Object[0]) is ambiguous... if only we had not added the vararg at the start...

So I think what they did was add enough explicit args to reach the point where the extra stack size meets the cost of vararg creation, which is probably about 10 arguments (at least based on the measurements that Log4J2 did when adding their varargs to the version 2 API)... but you are doing this for evidence based future-proofing...

In other words, we can cheat for all the cases that we do not have evidence requiring a specialized implementation and just fall-through to the vararg variant:

public class API {
  private static final <T> Set<T> internalOf(T... elements) { ... }
  public static final <T> Set<T> of(T first) { return internalOf(first); }
  public static final <T> Set<T> of(T first, T second) { return internalOf(first, second); }
  ...
  public static final <T> Set<T> of(T t1, T t2, T t3, T t4, T t5, T... rest) { ... }
}

Then we can profile and look at real world usage patterns and if we then see significant usage up to the 4 arg form and benchmarks showing there to be a reasonable perf gain, then at that point, behind the scenes, we change the method impl and everyone gets a win... no recompilation required

Stephen Connolly
  • 13,872
  • 6
  • 41
  • 63
  • 2
    Not to forget, adding `of(T)` after the fact doesn’t change the already compiled code. Worse, it doesn’t even improve future code if still compiled against an older version of the JDK. – Holger Aug 02 '17 at 08:23
  • 1
    Oh, strange world of Stackoverflow. Just last week the **same** [question](https://stackoverflow.com/questions/45339340/is-there-a-drawback-to-the-varargs-operator) was asked and got closed out as duplicate in 1 minute. And this version gets 20 upvotes and well received answers. Lucky you. – GhostCat Aug 03 '17 at 07:12
2

I guess it depends on the scope of the API you are working with. When talking about those Immutable classes you are talking about stuff included as part of the jdk; so scope is very broad.

So you have:

  1. in one side that these Immutable classes might be used by applications where every bit counts (and every nanosecond wasted in allocation/deallocation).
  2. in other side, that applications without those needs are not negatively impacted
  3. the only 'negative' side is for the implementors of that API that will have more clutter to deal with, so it affects maintanability (but not a big thing in this case).

If you were implementing your own stuff i would not care so much about that (but be cautious with varargs arguments) unless you really need to worry about those extra bits (and extra performance etcetc).

albert_nil
  • 1,648
  • 7
  • 9