15

I came accross the following code:

public static <T> Set<T> distinct(
        Collection<? extends T> list,
        Comparator<? super T> comparator) {

    Set<T> set = new TreeSet<>(comparator);
    set.addAll(list);
    return set;
}

This code just uses an intermediate TreeSet to remove duplicates, where equality among elements is defined as per the provided comparator.

Let's give local type inference an opportunity, I (naively) thought... So I changed the above code to:

public static <T> Set<T> distinct(
        Collection<? extends T> list,
        Comparator<? super T> comparator) {

    var set = new TreeSet<>(comparator);
    set.addAll(list);
    return set;
}

This made sense to me, because the type of set can be inferred from the type of comparator, or so I thought. However, the modified code doesn't compile and generates the following error:

java: incompatible types: java.util.TreeSet<capture#1 of ? super T> cannot be converted to java.util.Set<T>

Now, I understand why the error occurs and I admit that the type of the comparator is actually Comparator<? super T>, so the type inferred by var is TreeSet<? super T>.

However, I wonder why var isn't able to infer the generic type of TreeSet as just T instead of ? super T. After all, according to the docs, a TreeSet<E> has a constructor that accepts an argument of type Comparator<? super E>. So invoking this constructor should create a TreeSet<E>, not a TreeSet<? super E>. (This is what the first snippet shows). I expected var to follow this same logic.

Note 1: One way to make the code compile would be to change the return type to Set<? super T>. However, that would be a hardly usable set...

Note 2: Another way would be to not use contravariance in the comparator, but I don't want this, because I wouldn't be able to use a Comparator that compares ancestors of T.

Note 3: I know that the first snippet works, so it seems obvious that I should stick to not using var and declare the set explicitly as Set<T>. However, my question is not whether I should discard my second snippet or how to fix it. Instead, I'd like to know why var is not inferring TreeSet<T> as the type of the set local variable in my 2nd snippet.


EDIT 1: In this comment, user @nullpointer correctly points out that I should make the following subtle change to make the 2nd snippet compile:

var set = new TreeSet<T>(comparator); // T brings in the magic!

Now the generic type parameter T is explicit for TreeSet, so var correctly infers the type of the set local variable as TreeSet<T>. Still, I'd like to know why I must specify T explicitly.


EDIT 2: In this other comment, user @Holger cleverly mentions that the following is forbidden in the language:

var set = new TreeSet<? super T>(comparator);

The code above fails to compile with the following error:

java: unexpected type
  required: class or interface without bounds
  found:    ? super T

So now the question becomes more evident: if I cannot explicitly specify the bounded generic type ? super T in the instantiation expression new TreeSet<? super T>(comparator), why is the compiler infering TreeSet<? super T> as the type of the set local variable?

fps
  • 33,623
  • 8
  • 55
  • 110
  • 2
    Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/168213/discussion-on-question-by-federico-peralta-schaffner-local-type-inference-and-co). – Samuel Liew Apr 04 '18 at 12:03
  • 1
    I think it's due to more than two persons involved, so the automatic link doesn't appear and mods have to manually perform the action. If you really want to chat in this scenario, you can refer to this https://meta.stackexchange.com/questions/106467 – Samuel Liew Apr 04 '18 at 12:13
  • 1
    Well, I guess the comments on the question have been flagged by either moderators or users. You might want to update the question, the links are broken. – Naman Apr 08 '18 at 18:54

2 Answers2

9

According to Brian Goetz' answer on my question, he states:

Local variable type inference says: the types I need are probably already present on the right hand side, why repeat them on the left.

Regarding the code in your question, the only type available to infer (by using the Comparator provided) is TreeSet<? super T>. Us humans are smart enough to see that distinct returns set and expects a Set<T>. However, the compiler either may not be smart enough to resolve it (I'm sure it can), but it's more likely the fact that var infers the most specific type using the information provided on the RHS, and the architects didn't want to break that.

Now, as nullpointer stated in their comment, you can explicitly define your TreeSet to be of type T rather than the inferred capture type ? super T with the following:

var set = new TreeSet<T>(comparator);

I'd assume that explicit generic types override inferred types passed to the constructor, which would make sense.

JLS §14.4.1: Local Variable Declarators and Types seems to back up my claim, stating the following:

enter image description here

Note: "upward projection of T", which may just infer to the type (TreeSet instead of Set), but could possibly include generic type as well.

I believe it's the same reason why list in var list = List.<Number>of(1, 2, 3); is a List<Number> rather than a List<Integer>, which works just as well without var.

Jacob G.
  • 28,856
  • 5
  • 62
  • 116
  • Hi Jacob, thanks for your good answer. Yep, I agree that the types that local type inference needs are on the RHS, and that in my example, the generic type on the RHS is in fact `? super T`. However, I wonder why `var` is mechanically inferring `TreeSet super T>`. Who wants a bounded generic type in a local variable? Bounded generic type params are mostly useful when used in method arguments. Can't `var` be intelligent enough as to infer `T`? I guess that this is the heart of my question... – fps Apr 02 '18 at 19:22
  • 1
    I'm pretty sure type inference won't search for hints beyond the current statement. If the statement is a `return`, it may consult the current method's signature, and if the statement contains method invocations, it may probe the signatures of the invoked methods. It will not, however, look ahead at future uses of the assignment target. It won't inspect the return value because the return statement occurs after the current statement. Thus, the type of `var set` will be inferred _without_ consulting the signature of the declaring method. – Mike Strobel Apr 02 '18 at 19:24
  • 2
    I believe it *can* be intelligent enough, but, as Mike just stated above, there's no reason it should look ahead, since it only uses the information on the RHS of the same line to infer the type. If that were to be a future modification, I would definitely welcome it! – Jacob G. Apr 02 '18 at 19:27
  • 1
    @MikeStrobel yes, I understand that type inference doesn't look ahead, and I'm not claiming that it should do that in this case. I understand that `var` is designed to look for the type at the RHS only, and that's OK. I wonder why the type for `set` is inferred as `TreeSet super T>` instead of `TreeSet`. My point is: couldn't type inference be improved in this case? I see no point in having a local variable with a bounded generic type, especially if the generic type is represented by a type variable and not by an actual type. – fps Apr 02 '18 at 22:45
  • 3
    @FedericoPeraltaSchaffner every time we, devs, find such a case, tend to think why not here too? I've heard this many times already as an argument, I guess this could have been done, but its either not that simple as it looks, or will bring more sophisticated code along, for a not that often case. I guess... – Eugene Apr 03 '18 at 04:37
3

Using the local variable in your second snippet requires you to explicitly specify the bound of the TreeSet as in :

public static <T> Set<T> distinct(Collection<? extends T> list, Comparator<? super T> comparator) {
    var set = new TreeSet<T>(comparator);
    set.addAll(list);
    return set;
}

the reason being that the inferred var otherwise makes use of the most obvious bound used with the comparator and gets inferred as TreeSet<? super T> and fails with the stated compilation error due to the incompatibility of conversion.

why I must specify T explicitly

Thinking the other way round as pointed by Jacob as well and formulating the code as

  private static Set<Integer> distincts(Collection<? extends Integer> list, Comparator<Number> comparator) {
        var set = new TreeSet<>(comparator);
        // if you don't specify the bound, you get a compiler error on the return statement 
        // since the inferred type would be `Number`
        set.addAll(list);
        return set;
    }

To simply question back, what type would you want to be inferred here by default for TreeSet? Integer, Byte which one from the list of subclasses of Number and how(based on the return type which might just be inferred far later)?

Edit:- I do second the thought that given the constructor TreeSet(Comparator<? super E> comparator) constructs a TreeSet<E>, hence invocation of such constructor should be inferred as TreeSet<E> instead of TreeSet<? super E>. Also, as Brian commented, not everything could be inferred and for any such specific types, one could ask for it(assuming on the jdk mailing list).

Naman
  • 27,789
  • 26
  • 218
  • 353
  • 1
    Thanks for your answer. I think it's not so obvious that the bound used by the comparator is the one to be machanically inferred by `var`, that's the point of my question. Regarding your example next to the end of your answer, I think it's a different situation than my question's. You should receive a `Comparator super Integer>` to make it anaogous, then you could invoke the method with some `Comparator` or `Comparator`. Now you'll get an incompatible types compilation error that mimics mine's, though with the specific type `Integer` instead of with just `T`. – fps Apr 02 '18 at 21:11