I figure most of you know that goto
is a reserved keyword in the Java language but is not actually used. And you probably also know that goto
is a Java Virtual Machine (JVM) opcode. I reckon all the sophisticated control flow structures of Java, Scala and Kotlin are, at the JVM level, implemented using some combination of goto
and ifeq
, ifle
, iflt
, etc.
Looking at the JVM spec https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.goto_w I see there's also a goto_w
opcode. Whereas goto
takes a 2-byte branch offset, goto_w
takes a 4-byte branch offset. The spec states that
Although the goto_w instruction takes a 4-byte branch offset, other factors limit the size of a method to 65535 bytes (§4.11). This limit may be raised in a future release of the Java Virtual Machine.
It sounds to me like goto_w
is future-proofing, like some of the other *_w
opcodes. But it also occurs to me that maybe goto_w
could be used with the two more significant bytes zeroed out and the two less significant bytes the same as for goto
, with adjustments as needed.
For example, given this Java Switch-Case (or Scala Match-Case):
12: lookupswitch {
112785: 48 // case "red"
3027034: 76 // case "green"
98619139: 62 // case "blue"
default: 87
}
48: aload_2
49: ldc #17 // String red
51: invokevirtual #18
// Method java/lang/String.equals:(Ljava/lang/Object;)Z
54: ifeq 87
57: iconst_0
58: istore_3
59: goto 87
62: aload_2
63: ldc #19 // String green
65: invokevirtual #18
// Method java/lang/String.equals:(Ljava/lang/Object;)Z
68: ifeq 87
71: iconst_1
72: istore_3
73: goto 87
76: aload_2
77: ldc #20 // String blue
79: invokevirtual #18
// etc.
we could rewrite it as
12: lookupswitch {
112785: 48
3027034: 78
98619139: 64
default: 91
}
48: aload_2
49: ldc #17 // String red
51: invokevirtual #18
// Method java/lang/String.equals:(Ljava/lang/Object;)Z
54: ifeq 91 // 00 5B
57: iconst_0
58: istore_3
59: goto_w 91 // 00 00 00 5B
64: aload_2
65: ldc #19 // String green
67: invokevirtual #18
// Method java/lang/String.equals:(Ljava/lang/Object;)Z
70: ifeq 91
73: iconst_1
74: istore_3
75: goto_w 91
79: aload_2
81: ldc #20 // String blue
83: invokevirtual #18
// etc.
I haven't actually tried this, since I've probably made a mistake changing the "line numbers" to accommodate the goto_w
s. But since it's in the spec, it should be possible to do it.
My question is whether there is a reason a compiler or other generator of bytecode might use goto_w
with the current 65535 limit other than to show that it can be done?