13

Let's compile the following code with ECJ compiler from Eclipse Mars.2 bundle:

import java.util.stream.*;

public class Test {
    String test(Stream<?> s) {
        return s.collect(Collector.of(() -> "", (a, t) -> {}, (a1, a2) -> a1));
    }
}

The compilation command is the following:

$ java -jar org.eclipse.jdt.core_3.11.2.v20160128-0629.jar -8 -g Test.java

After the successful compilation let's check the resulting class file with javap -v -p Test.class. The most interesting is the synthetic method generated for the (a, t) -> {} lambda:

  private static void lambda$1(java.lang.String, java.lang.Object);
    descriptor: (Ljava/lang/String;Ljava/lang/Object;)V
    flags: ACC_PRIVATE, ACC_STATIC, ACC_SYNTHETIC
    Code:
      stack=0, locals=2, args_size=2
         0: return
      LineNumberTable:
        line 5: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       1     0     a   Ljava/lang/String;
            0       1     1     t   Ljava/lang/Object;
      LocalVariableTypeTable:
        Start  Length  Slot  Name   Signature
            0       1     1     t   !*

I was quite surprised to see this !* entry in LocalVariableTypeTable. JVM specification covers LocalVariableTypeTable attribute and says:

The constant_pool entry at that index must contain a CONSTANT_Utf8_info structure (§4.4.7) representing a field signature which encodes the type of a local variable in the source program (§4.7.9.1).

§4.7.9.1 defines a grammar for field signatures which, if I understand correctly, does not cover anything similar to !*.

It should also be noted that neither javac compiler, nor older ECJ 3.10.x versions generate this LocalVariableTypeTable entry. Is !* some non-standard Eclipse extension or I'm missing something in JVM spec? Does this mean that ECJ does not conform to JVM spec? What !* actually mean and are there any other similar strings which could appear in LocalVariableTypeTable attribute?

Lii
  • 11,553
  • 8
  • 64
  • 88
Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • 2
    It might be linked to those bugs [#429264](https://bugs.eclipse.org/bugs/show_bug.cgi?id=429264) and [#425183](https://bugs.eclipse.org/bugs/show_bug.cgi?id=425183). – SubOptimal May 17 '16 at 07:58

1 Answers1

7

The token ! is used by ecj to encode a capture type in generic signatures. Hence !* signifies a capture of an unbounded wildcard.

Internally, ecj uses two flavours of CaptureBinding, one to implement, what JLS 18.4 calls "fresh type variables", the other to implement captures a la JLS 5.1.10 (which uses the same lingo of "free type variables"). Both produce a signature using !. At a closer look, in this example we have an "old-style" capture: t has type capture#1-of ?, capturing the <T> in Stream<T>.

The problem is: JVMS 4.7.9.1. doesn't seem to define an encoding for such fresh type variables (which among other properties have no correspondence in source code and hence no name).

I couldn't get javac to emit any LocalVariableTypeTable for the lambda, so they might simply avoid answering this question.

Given that both compilers agree on inferring t to a capture, why does one compiler generate a LVTT, where the other does not? JVMS 4.7.14 has this

This difference is only significant for variables whose type uses a type variable or parameterized type.

According to JLS, captures are fresh type variables, so an LVTT entry is significant, and it is an omission in JVMS not to specify a format for this type.

Consequences

The above only describes and explains the status quo, demonstrating that no specification tells a compiler to behave differently from current status. Obviously, this is not an entirely desirable situation.

  1. Someone may want to contact Oracle, mentioning that Java 8 introduces a situation that is not covered by parts of the JVMS. This situation may become even more relevant once also local variables become subject to type inference
  2. Anybody observing negative impact of the current situation is invited to chime in in rfe 494198 (ecj), which otherwise has low priority.

Update: Meanwhile someone has reported an example where a regular Signature attribute (which cannot be opportunistically omitted) is required to encode a type which cannot be encoded according to JVMS. In that case also javac creates unspecified byte code. According to a follow-up no variable should ever have such a type, but I don't think that this discussion is over, yet (and admittedly JLS doesn't yet ensure this goal).

Update 2: After receiving advice from a spec author I see three parts to the ultimate solution:

(1) Every type signature in any bytecode attribute must adhere to the grammar in JVMS 4.7.9.1. Neither ecj's ! nor javac's <captured wildcard> is legal.

(2) Compilers should approximate type signatures where no legal encoding exists, e.g., by using the erasure instead of a capture. For an LVTT entry, such approximation should be considered as legitimate.

(3) JLS must ensure that only types encodable using JVMS 4.7.9.1 appear in positions where generating a Signature attribute is mandatory.

For future versions of ecj items (1) and (2) have been resolved. I cannot speak about schedules when javac and JLS will be fixed accordingly.

Stephan Herrmann
  • 7,963
  • 2
  • 27
  • 38
  • Thank you for the answer. Why ecj creates `LocalVariableTypeTable` for lambdas? Probably it would be better to skip this as javac does. Is it useful for something? – Tagir Valeev May 20 '16 at 03:57
  • All we need to get lambda with `LocalVariableTypeTable` is to use parametrized type in lambda's body and, of course, compile with `javac -g:vars`. Here is an example: https://gist.github.com/Maccimo/c881bb71f1e9d14853de3a0e8a5ab077 – user882813 May 20 '16 at 05:06
  • @TagirValeev, ecj creates a `LocalVariableTypeTable` for every method in the byte code that contains at least one local variable with a "generic" type (parameterized or type variable). At the bytecode level the lambda method is a regular (synthetic) method, no reason to exclude this from the spec'd behaviour. Without the LVTT argument `t` appears to be of type `Object`, which isn't the full answer, anybody reading the bytecode who is aware of generics needs the LVTT for the full answer. – Stephan Herrmann May 20 '16 at 08:35
  • 1
    These “fresh type variables” help to verify the correctness of the generic invocation of the `Collector.of` method, but I don’t see any reason why they should appear in a local variable table. The parameter `t` is not generic, it’s just `Object` as that’s the *only* type that is valid in that context. It’s worth noting that `t` is a *parameter* of the synthetic method, so if `t` was something other than plain `Object`, ECJ had to provide a Generic method signature describing the Generic parameter, but it didn’t. – Holger May 20 '16 at 10:46
  • On closer inspection, we have a regular capture here, not a fresh type variable from 18.4., but the observable effect is the same. How to represent captures in LVTT is not specified in JVMS. – Stephan Herrmann May 20 '16 at 12:41
  • Capture enters the picture right at the start because the expression `s` has type `Stream`. This capture percolates all the way into inference of the middle lambda, to finally become the type of `t`. @Holger, can you show, why this inference result would be wrong, why instead `Object` should be inferred? For me the difference between compilers very much sounds like https://bugs.openjdk.java.net/browse/JDK-8016207 – Stephan Herrmann May 20 '16 at 12:56
  • Interestingly, during type checking even javac knows the correct type of `t`: add `t = new Object();` into the middle lambda, and javac will correctly complain: "Object cannot be converted to CAP#1". – Stephan Herrmann May 20 '16 at 13:14
  • 2
    See [JLS§15.27.3](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.27.3): “*If T is a wildcard-parameterized functional interface type and the lambda expression is implicitly typed, then the ground target type is the non-wildcard parameterization (§9.9) of T*” and at the end of §9.9: “*Sometimes, it is possible to known from the context, such as the parameter types of a lambda expression, which function type is intended (§15.27.3). Other times, it is necessary to pick one; in these circumstances, the bounds are used.*” – Holger May 20 '16 at 13:33
  • 1
    I general, variables have a real type, not something like `CAP#1` that exists only within the compiler. So it’s not surprising that the JVMS has no way to encode such non-type things. There is no other purpose of the local variable tables than debugging anyway, so what would be the point of telling a debugger that a variable’s type is `CAP#1` rather than `?` or just `Object`? – Holger May 20 '16 at 13:51
  • @Holger, sure ecj implements and applies JLS 15.27.3 where appropriate. However, when resolving the lambda, the target type `BiConsumer` is parameterized with a capture not a wildcard. Please see that also javac resolves `t` to `CAP#1` as shown in another comment. – Stephan Herrmann May 20 '16 at 13:59
  • @Holger: your view about real-typed local variables only holds for explicitly typed variables. Lambda arguments (and in the future inferred local variables) can have any type that is a possible result of inference. – Stephan Herrmann May 20 '16 at 14:02
  • 1
    Since discussing the formal specification really exceeds the scope of SO, let’s end this by focusing on the one practical question: since the `LocalVariableTypeTable` merely exists for debugging purposes: what will the Eclipse debugger show, when it encounters a `!*` in said table? Will it be in any way more useful than what it will show when encountering an equivalent `javac` compiled lambda expression not even having that table? – Holger May 20 '16 at 14:32
  • I cannot answer the "why" of this LVTT entry, this decision was basically made 11 years ago (and only incidentally surfaces now via lambdas). NB: While JVMS only mentions debugging, nowadays there are plenty more tools that read byte code. I can only answer the original question: "Does this mean that ECJ does not conform to JVM spec?" by saying: I don't see any violation of JLS nor JVMS. – Stephan Herrmann May 20 '16 at 15:00
  • @StephanHerrmann The ECJ's `!*` does not comply with JVMS 4.7.9.1. Isn't it? – user882813 May 20 '16 at 15:52
  • @user882813, the `!` token fills a gap in the spec. So it neither fulfils nor violates the spec. I could even argue that the compiler would still be correct if it crashes in this situation. Answering `!` is done to avoid that crash, though. I could only see one reason for changing this behavior: if it breaks downstream tools that consume the LVTT. Does it? – Stephan Herrmann May 20 '16 at 16:12
  • 1
    @StephanHerrmann The only way to fill gap in spec is to issue revised version of spec. JVMS 4.7.9.1 define a grammar signature should comply to. And `!*` violate such a grammar. So ECJ violates JVMS by emitting `!*` in LVTT. BTW, there is other examples when ECJ behavior differ from javac in emitting of LVTT. Try to compile source from the gist I posted above. Javac will generate LVTT and ECJ will not at all. – user882813 May 20 '16 at 17:03
  • When conforming to the spec ecj would use the rule `TypeVariabeSignature` as to emit `[ Identifier ;` and crash because Identifier is null. Better? Tell me: is the current behavior causing any harm? – Stephan Herrmann May 20 '16 at 17:08
  • @StephanHerrmann If ECJ is unable to figure out what to emit then it should not emit anything since LVTT is not a mandatory attribute. And, as I stated before, ECJ is in fact didn't emit LVTT entry for local variable of type `List` while JAVAC does. So it's not a problem for ECJ to avoid generating meaningless garbage. – user882813 May 21 '16 at 03:58
  • 2
    @user882813 thanks for the additional test case. Here it turns out the LVTT entry is just "optimized" out, since the local variable is unused. If the variable is used, the LVTT entry is correctly generated. You can follow progress on this issue via https://bugs.eclipse.org/494225 – Stephan Herrmann May 21 '16 at 10:22
  • @StephanHerrmann well it causes harm for me: procyon compiler tools library dies when tries to parse such entry. I might convince its author to ignore this particular string, but it would be better if I could refer to some specification or whatever... – Tagir Valeev May 22 '16 at 10:03
  • I also see the syntax !+ This currently chokes bcel. – MeBigFatGuy Jul 03 '16 at 02:33