Why does the JVM have both `invokespecial` and `invokestatic` opcodes?

Question

Both instructions use static rather than dynamic dispatch. It seems like the only substantial difference is that invokespecial will always have, as its first argument, an object that is an instance of the class that the dispatched method belongs to. However, invokespecial does not actually put the object there; the compiler is the one responsible for making that happen by emitting the appropriate sequence of stack operations before emitting invokespecial. So replacing invokespecial with invokestatic should not affect the way the runtime stack / heap gets manipulated -- though I expect that it will cause a VerifyError for violating the spec.

I'm curious about the possible reasons behind making two distinct instructions that do essentially the same thing. I took a look at the source of the OpenJDK interpreter, and it seems like invokespecial and invokestatic are handled almost identically. Does having two separate instructions help the JIT compiler better optimize code, or does it help the classfile verifier prove some safety properties more efficiently? Or is this just a quirk in the JVM's design?

My suspicion is that it helps to make the verifier more efficient (by not having to check if the instance is properly set for static methods). — Thilo, Dec 20 '12 at 01:56

score 3 · Answer 1 · edited Dec 25 '15 at 23:26

Disclaimer: It is hard to tell for sure since I never read an explicit Oracle statement about this, but I pretty much think this is the reason:

When you look at Java byte code, you could ask the same question about other instructions. Why would the verifier stop you when pushing two ints on the stack and treating them as a single long right after? (Try it, it will stop you.) You could argue that by allowing this, you could express the same logic with a smaller instruction set. (To go further with this argument, a byte cannot express too many instructions, the Java byte code set should therefore cut down wherever possible.)

Of course, in theory you would not need a byte code instruction for pushing ints and longs to the stack and you are right about the fact that you would not need two instructions for INVOKESPECIAL and INVOKESTATIC in order to express method invocations. A method is uniquely identified by its method descriptor (name and raw argument types) and you could not define both a static and a non-static method with an identical description within the same class. And in order to validate the byte code, the Java compiler must check whether the target method is static nevertheless.

Remark: This contradicts the answer of v6ak. However, a methods descriptor of a non-static method is not altered to include a reference to this.getClass(). The Java runtime could therefore always infer the appropriate method binding from the method descriptor for a hypothetical INVOKESMART instruction. See JVMS §4.3.3.

So much for the theory. However, the intentions that are expressed by both invocation types are quite different. And remember that Java byte code is supposed to be used by other tools than javac to create JVM applications, as well. With byte code, these tools produce something that is more similar to machine code than your Java source code. But it is still rather high level. For example, byte code still is verified and the byte code is automatically optimized when compiled to machine code. However, the byte code is an abstraction that intentionally contains some redundancy in order to make the meaning of the byte code more explicit. And just like the Java language uses different names for similar things to make the language more readable, the byte code instruction set contains some redundancy as well. And as another benefit, verification and byte code interpretation/compilation can speed up since a method's invocation type does not always need to be inferred but is explicitly stated in the byte code. This is desirable because verification, interpretation and compilation are done at runtime.

As a final anecdote, I should mention that a class's static initializer <clinit> was not flagged static before Java 5. In this context, the static invocation could also be inferred by the method's name but this would cause even more run time overhead.

I agree I've missed that I can't use the same method signature for static and non-static method. I've tried it and it generates java.lang.ClassFormatError: Duplicate method name&signature in class file Foo. I will correct that. — v6ak, Dec 07 '13 at 11:05

v6ak · Answer 2 · 2013-12-07T11:14:35.497

There are the definitions:

There are significant differences. Say we want to design an invokesmart instruction, which would choose smartly between inkovestatic and invokespecial:

First, it would not be a problem to distinguish between static and virtual calls, since we can't have two methods with same name, same parameter types and same return type, even if one is static and second is virtual. JVM does not allow that (for a strange reason). Thanks raphw for noticing that.

~~First, what would invokesmart foo/Bar.baz(I)I mean? It may mean:~~

~~A static method call foo.Bar.baz that consumes int from operand stack and adds another int. // (int) -> (int)~~
~~An instance method call foo.Bar.baz that consumes foo.Bar and int from operand stack and adds int. // (foo.Bar, int) -> (int)~~

~~How would you choose from them? There may exist both methods.~~

~~We may try to solve it by requiring foo/Bar.baz(Lfoo/Bar;I) for the static call. However, we may have both public static int baz(Bar, int) and public int baz(int).~~

We may say that it does not matter and possibly disable such situation. (I don't think that it is a good idea, but just to imagine.) What would it mean?

If the method is static, there are probably no additional restrictions. On the other hand, if the method is not static, there are some restrictions: "Finally, if the resolved method is protected (§4.6), and it is either a member of the current class or a member of a superclass of the current class, then the class of objectref must be either the current class or a subclass of the current class."
There are some further differences, see the note about ACC_SUPER.
It would mean that all the referenced classes must be loaded before bytecode verification. I hope this is not necessary now, but I am not 100% sure.

So, it would mean very inconsistent behavior.

Why does the JVM have both `invokespecial` and `invokestatic` opcodes?

2 Answers2

Linked