64

I've recently been looking at The Java Virtual Machine Specifications (JVMS) to try to better understand the what makes my programs work, but I've found a section that I'm not quite getting...

Section 4.7.4 describes the StackMapTable Attribute, and in that section the document goes into details about stack map frames. The issue is that it's a little wordy and I learn best by example; not by reading.

I understand that the first stack map frame is derived from the method descriptor, but I don't understand how (which is supposedly explained here.) Also, I don't entirely understand what the stack map frames do. I would assume they're similar to blocks in Java, but it appears as though you can't have stack map frames inside each other.

Anyway, I have two specific questions:

  • What do the stack map frames do?
  • How is the first stack map frame created?

and one general question:

  • Can someone provide an explanation less wordy and easier to understand than the one given in the JVMS?
Raedwald
  • 46,613
  • 43
  • 151
  • 237
Steven
  • 1,709
  • 3
  • 17
  • 27
  • 1
    @EJP It's something I'm working on. That's one of the main reasons I decided to read the JVMS in the first place. – Steven Aug 03 '14 at 23:48
  • 3
    @EJP I've also been reading the JVM spec, and believe me, it is not just like reading the spec, for instance: to understand the part of how the type verification works (related to this question) you need to have a basic knowledge of Prolog programming... so I think a question/answer for this is worth to be in Stackoverflow – morgano Aug 03 '14 at 23:56
  • @morgano, I find that it's much more helpful to ignore the Prolog stuff and focus on the classic inference verifier. The new verifier is very similar, they just decided to specify it in 200 pages of Prolog instead of using a vauge English description like the old one – Antimony Aug 04 '14 at 01:09
  • @Antimony exactly, that is what this question is about, to translate to plain English the formal specification. – morgano Aug 04 '14 at 01:11
  • 1
    I have tried to explain them comprehensively here http://www.volatileinterface.com/understanding-the-java-class-file-format-stack-map-tables/ – Markovian8261 Sep 12 '15 at 18:16
  • @popgalop since this link is dead, is it available somewhere else may be? – Eugene Feb 26 '18 at 21:33
  • https://web.archive.org/web/20170327123525/http://www.volatileinterface.com/understanding-the-java-class-file-format-stack-map-tables/ – Markovian8261 Jun 23 '18 at 23:17
  • Unfortunately programming language semantics & reasoning about programming & programs is extremely poorly taught. Learning about a logical machine used to implement programs (eg the JVM) is not generally a good way to learn about those. What is appropriate is an abstract machine designed not for implementing programs but for describing the semantics (as with C & C++). Program design & debugging involves at the highest level state changes as that abstract machine is initialized & executes. Key to semantics are contracts & loop & module invariants. – philipxy Apr 11 '23 at 08:31

1 Answers1

155

Java requires all classes that are loaded to be verified, in order to maintain the security of the sandbox and ensure that the code is safe to optimize. Note that this is done on the bytecode level, so the verification does not verify invariants of the Java language, it merely verifies that the bytecode makes sense according to the rules for bytecode.

Among other things, bytecode verification makes sure that instructions are well formed, that all the jumps are to valid instructions within the method, and that all instructions operate on values of the correct type. The last one is where the stack map comes in.

The thing is that bytecode by itself contains no explicit type information. Types are determined implicitly through dataflow analysis. For example, an iconst instruction creates an integer value. If you store it in slot 1, that slot now has an int. If control flow merges from code which stores a float there instead, the slot is now considered to have invalid type, meaning that you can't do anything more with that value until overwriting it.

Historically, the bytecode verifier inferred all the types using these dataflow rules. Unfortunately, it is impossible to infer all the types in a single linear pass through the bytecode because a backwards jump might invalidate already inferred types. The classic verifier solved this by iterating through the code until everything stopped changing, potentially requiring multiple passes.

However, verification makes class loading slow in Java. Oracle decided to solve this issue by adding a new, faster verifier, that can verify bytecode in a single pass. To do this, they required all new classes starting in Java 7 (with Java 6 in a transitional state) to carry metadata about their types, so that the bytecode can be verified in a single pass. Since the bytecode format itself can't be changed, this type information is stored seperately in an attribute called StackMapTable.

Simply storing the type for every single value at every single point in the code would obviously take up a lot of space and be very wasteful. In order to make the metadata smaller and more efficient, they decided to have it only list the types at positions which are targets of jumps. If you think about it, this is the only time you need the extra information to do a single pass verification. In between jump targets, all control flow is linear, so you can infer the types at in between positions using the old inference rules.

Each position where types are explicitly listed is known as a stack map frame. The StackMapTable attribute contains a list of frames in order, though they are usually expressed as a difference from the previous frame in order to reduce data size. If there are no frames in the method, which occurs when control flow never joins (i.e. the CFG is a tree), then the StackMapTable attribute can be omitted entirely.

So this is the basic idea of how StackMapTable works and why it was added. The last question is how the implicit initial frame is created. The answer of course is that at the beginning of the method, the operand stack is empty and the local variable slots have the types given by the types of the method parameters, which are determined from the method decriptor.

If you're used to Java, there are a few minor differences to how method parameter types work at the bytecode level. First off, virtual methods have an implicit this as first parameter. Second, boolean, byte, char, and short do not exist at the bytecode level. Instead, they are all implemented as ints behind the scenes.

Raedwald
  • 46,613
  • 43
  • 151
  • 237
Antimony
  • 37,781
  • 10
  • 100
  • 107
  • 11
    As an amendment to your last paragraph, `long` and `double` parameters will, like for all local variables, consume *two* local variables in the stack frame. – Holger Aug 04 '14 at 11:01
  • 16
    Great explanation of an obscure topic. – Mike Strobel Aug 04 '14 at 13:51
  • I'm pretty new to this bytecode stuff, but if I'm writing an app which has a fixed list of variables, is defining Frames for jumps strictly necessary? My ASM Eclipse plugin inserts frames, but it seems the code works just fine without them - and the program is using both If and do-while. – ThomasRS Aug 28 '18 at 22:41
  • New indeed, I was not aware that ASM can be configured to automatically insert the frames! – ThomasRS Sep 12 '18 at 12:40
  • @ThomasRS yes ASM can automatically compute frames (althought it's twice as expensive as manually computing it in many cases - but from a friend's experience, calculating frames manually is a pain) – arviman May 12 '20 at 05:35
  • I just don't know why they named it stack map frame? It makes me to think that it's related to frames in jvm stack! – sify Oct 24 '22 at 09:57
  • Ok, maybe it means a map of operand stack variable and its type. a operand stack is in a frame, a frame is in a jvm stack, and each entry in a stack map is called a stack map frame! – sify Oct 24 '22 at 10:11