Java bytecode: local variables table vs on-stack calculation

Question

Assume that we have a following class:

final class Impl implements Gateway3 {
    private final Sensor sensor1;
    private final Sensor sensor2;
    private final Sensor sensor3;

    private final Alarm alarm;

    public Impl(Sensor sensor1, Sensor sensor2, Sensor sensor3, Alarm alarm) {
        this.sensor1 = sensor1;
        this.sensor2 = sensor2;
        this.sensor3 = sensor3;
        this.alarm = alarm;
    }

    @Override
    public Temperature averageTemp() {
        final Temperature temp1 = sensor1.temperature();
        final Temperature temp2 = sensor2.temperature();
        final Temperature temp3 = sensor3.temperature();

        final Average tempAvg = new Average.Impl(temp1, temp2, temp3);
        final Temperature result = tempAvg.result();
        return result;
    }

    @Override
    public void poll() {
        final Temperature avgTemp = this.averageTemp();
        this.alarm.trigger(avgTemp);
    }

This class widely uses local variables and all of them are final.

If we look at the bytecode generated for, let's say, averageTemp method, we'll see the following bytecode:

   0: aload_0
   1: getfield      #2                  // Field sensor1:Lru/mera/avral/script/bytecode/demo/Sensor;
   4: invokeinterface #6,  1            // InterfaceMethod ru/mera/avral/script/bytecode/demo/Sensor.temperature:()Lru/mera/avral/script/bytecode/demo/Temperature;
   9: astore_1
  10: aload_0
  11: getfield      #3                  // Field sensor2:Lru/mera/avral/script/bytecode/demo/Sensor;
  14: invokeinterface #6,  1            // InterfaceMethod ru/mera/avral/script/bytecode/demo/Sensor.temperature:()Lru/mera/avral/script/bytecode/demo/Temperature;
  19: astore_2
  20: aload_0
  21: getfield      #4                  // Field sensor3:Lru/mera/avral/script/bytecode/demo/Sensor;
  24: invokeinterface #6,  1            // InterfaceMethod ru/mera/avral/script/bytecode/demo/Sensor.temperature:()Lru/mera/avral/script/bytecode/demo/Temperature;
  29: astore_3
  30: new           #7                  // class ru/mera/avral/script/bytecode/demo/Average$Impl
  33: dup
  34: aload_1
  35: aload_2
  36: aload_3
  37: invokespecial #8                  // Method ru/mera/avral/script/bytecode/demo/Average$Impl."<init>":(Lru/mera/avral/script/bytecode/demo/Temperature;Lru/mera/avral/script/bytecode/demo/Temperature;Lru/mera/avral/script/bytecode/demo/Temperature;)V
  40: astore        4
  42: aload         4
  44: invokeinterface #9,  1            // InterfaceMethod ru/mera/avral/script/bytecode/demo/Average.result:()Lru/mera/avral/script/bytecode/demo/Temperature;
  49: astore        5
  51: aload         5
  53: areturn

There are plenty of astore opcodes.

Now, assume that using bytecode generation library, I generated the following bytecode for the same method:

   0: new           #18                 // class ru/mera/avral/script/bytecode/demo/Average$Impl
   3: dup
   4: aload_0
   5: getfield      #20                 // Field sensor1:Lru/mera/avral/script/bytecode/demo/Sensor;
   8: invokeinterface #25,  1           // InterfaceMethod ru/mera/avral/script/bytecode/demo/Sensor.temperature:()Lru/mera/avral/script/bytecode/demo/Temperature;
  13: aload_0
  14: getfield      #27                 // Field sensor2:Lru/mera/avral/script/bytecode/demo/Sensor;
  17: invokeinterface #25,  1           // InterfaceMethod ru/mera/avral/script/bytecode/demo/Sensor.temperature:()Lru/mera/avral/script/bytecode/demo/Temperature;
  22: aload_0
  23: getfield      #29                 // Field sensor3:Lru/mera/avral/script/bytecode/demo/Sensor;
  26: invokeinterface #25,  1           // InterfaceMethod ru/mera/avral/script/bytecode/demo/Sensor.temperature:()Lru/mera/avral/script/bytecode/demo/Temperature;
  31: invokespecial #33                 // Method ru/mera/avral/script/bytecode/demo/Average$Impl."<init>":(Lru/mera/avral/script/bytecode/demo/Temperature;Lru/mera/avral/script/bytecode/demo/Temperature;Lru/mera/avral/script/bytecode/demo/Temperature;)V
  34: invokevirtual #36                 // Method ru/mera/avral/script/bytecode/demo/Average$Impl.result:()Lru/mera/avral/script/bytecode/demo/Temperature;
  37: areturn

Semantically, this new method implementation has the same meaning comparing to the old one - it still takes the temperature value from three sensors, make an average from them and returns it. But instead of putting intermediate values to variables, it does all the calculations on stack. I can rewrite it that way since all my local variables and fields are final.

Now there is a question: if I am doing some bytecode-generation-related magic and follow this "all calculations on stack" approach everywhere (assuming that all my variables and fields are final), what potential pitfalls may I face?

NOTE: I have no intention to rewrite bytecode for existing Java classes in the way I described. The example class is given here just to show the method semantics I want to achieve in my bytecode.

Isn't your bytecode essentially `return new Average.Impl(sensor1.temperature(), sensor2.temperature(), sensor3.temperature()).result();`? — biziclop, Apr 05 '17 at 21:18
> Isn't your bytecode essentially return new Average.Impl(sensor1.temperature(), sensor2.temperature(), sensor3.temperature()).result(); < Let it be. But it is not what the question is about. In first case, number of values on stack is rather predictable and small. In the second case, it depends on the method logic and can grow quite large. Would it cause me troubles in some cases? — skapral, Apr 05 '17 at 21:22
@skapral You must be able to tell the maximum depth of your stack in your method header, so that's one pitfall. But both your operand stack and your local variable table is likely to be allocated in the same stack frame, so the overall size will always be the same. — biziclop, Apr 05 '17 at 21:23
@biziclop Today's bytecode generation tools may calculate and set it for me, don't they? So - ok, I know the maximum stack size, and it is rather large. What next? — skapral, Apr 05 '17 at 21:28
@Holger in the context of the task I solve, I definitely can prove and rely on that statement. In fact, I am generating bytecode from some formal description, where instead of the variables there are bindings, like 'let' in FP. This formal description has nothing common with Java and its bytecode, its fully declarative. What I am doing is just generating implementation which follows this description. The class I provided here is just an example. — skapral, Apr 06 '17 at 10:20

GhostCat · Answer 1 · 2017-04-05T21:27:54.827

7

The biggest pitfall: you might accidentally prevent the JIT from doing its job.

And thereby achieving the exact opposite of your goal: reduced runtime performance.

The JIT is (to a certain degree) written to create best results for well known, often used coding patterns. If you make its job harder, chances are it will do a less optimal job.

The point is: in contrast to other languages, the java compilers are not doing a lot of optimization steps. The real magic happens later... when the JIT kicks in. Thus: you would have to study what the JIT is doing in great detail to understand how to create better bytecode that can also be "JITed" fine later on.

edited Apr 05 '17 at 21:27

answered Apr 05 '17 at 21:21

GhostCat

137,827
25
176
248

I don't think most java compilers do any optimisation, apart from possibly some very basic dead code elimination. – biziclop Apr 05 '17 at 21:26
2

They do constant folding. And they translate string + into StringBuilder append calls where possible. – GhostCat Apr 05 '17 at 21:29
I would call "translate string + into StringBuilder append calls" an *optimization*. That's called *compiling*, i.e. generating the appropriate bytecode for the Java code in question. – Andreas Apr 05 '17 at 21:30
@GhostCat thanks for your answer. It seems like what I am looking for. Need to meditate on that for a while to continue discussion. – skapral Apr 05 '17 at 21:31
*FYI:* There is really no pitfall in this particular case, since you can write Java code to get that exact bytecode. See [my answer](http://stackoverflow.com/a/43241949/5221149). – Andreas Apr 05 '17 at 21:36
The question is not 100 percent clear there. As it talks about generic bytecode generation combined with the specific solution. – GhostCat Apr 05 '17 at 23:00
Any suggestions on how can I improve it? – skapral Apr 06 '17 at 09:11
I would focus on this sentence *if I am doing some bytecode-generation-related magic* ... that leaves room for **a lot** of interpretation. Maybe you can rephrase to give an **exact** scope of the transformations you intend to do. (maybe it is just about dropping that part of the sentence?) – GhostCat Apr 06 '17 at 09:13
It's very unlikely that using variables instead of the stack would prevent optimizations in a good (SSA-based) JIT compiler. Irreducibility of the control flow graph induced by the bytecode would typically prevent some JITs from doing a good (or any) job, but not variables. – axel22 Jun 16 '18 at 13:42

score 5 · Accepted Answer · edited May 23 '17 at 11:46

As shown by Andreas’ answer, it’s not unusual to have Java code utilizing the stack for temporary values, like in nested expressions. That’s why the instruction set was created that way, using an operand stack to refer to previously calculated value implicitly. In fact, I’d call your code example with its excessive use of local variables unusual.

If the input of your byte code producing tool is not Java code, the amount of variables might differ from typical Java code, especially if they are of a declarative nature, so there is no requirement to have all of them directly mapped to local variables in byte code.

JVMs like HotSpot transfer the code into an SSA form, where all transfer operations between local variables and the operand stack, as well as pure stack manipulations like dup and swap, are eliminated anyway, before applying subsequent optimizations, so your choice of using local variables or not will not have any performance impact.

It might be worth noting that you usually can’t inspect values on the operand stack in debuggers, so you might consider retaining variables when making a debug build (when the LocalVariableTable is generated, too).

Some code constructs require local variables. E.g. when you have an exception handler, its entry point will have the operand stack cleared, only containing the reference to the exception, so all values it wants to access have to be materialized as local variables. I don’t know if your input form has loop constructs, if so, you usually will convert them from their declarative form to a conventional loop using a mutable variable under the hood, when necessary. Mind the iinc instruction, which works directly with a local variable…

After some consideration, I decided to approve this particular answer. Mainly because of the statement that there is no difference between approaches from usual jvm implementation perspective. — skapral, Apr 08 '17 at 10:35
While the answer is correct from the perspective of an SSA-based JIT compiler (modulo the fact that parsing a variable-based code might take longer), I think that the answer would be even better if it mentioned how using more variables affects the interpreter (i.e. is interpretation slower/faster/consumes more memory?). — axel22, Jun 16 '18 at 13:40

score 4 · Answer 3 · answered Apr 05 '17 at 21:28

4

Your bytecode is eliminating the local variables, which you can do in Java too:

public Temperature averageTemp() {
    return new Average.Impl(sensor1.temperature(),
                            sensor2.temperature(),
                            sensor3.temperature()).result();
}

This will generate the following bytecode:

   0: new           #38                 // class Average$Impl
   3: dup
   4: aload_0
   5: getfield      #27                 // Field sensor1:LSensor;
   8: invokevirtual #29                 // Method Sensor.temperature:()LTemperature;
  11: aload_0
  12: getfield      #34                 // Field sensor2:LSensor;
  15: invokevirtual #29                 // Method Sensor.temperature:()LTemperature;
  18: aload_0
  19: getfield      #36                 // Field sensor3:LSensor;
  22: invokevirtual #29                 // Method Sensor.temperature:()LTemperature;
  25: invokespecial #40                 // Method Average$Impl."<init>":(LTemperature;LTemperature;LTemperature;)V
  28: invokevirtual #55                 // Method Average$Impl.result:()LTemperature;
  31: areturn

That is exactly what you did, so is there a problem with doing it that way? NO.

But, is there a reason to choose one over the other? No. The JIT compiler will likely do that anyway.

answered Apr 05 '17 at 21:28

Andreas

154,647
11
152
247

initially for me it is a matter of convenience. It turns out for me that generating on-stack variant is more convenient to implement in my particular case. So, now I am looking for drawbacks of such approach. – skapral Apr 05 '17 at 21:36
1

@skapral As you can see. There are no drawbacks, since it's perfectly valid bytecode for the functionality in question. Now, if you generate bytecode in a way that cannot be done by Java code, [answer by @GhostCat](http://stackoverflow.com/a/43241850/5221149) becomes relevant, i.e. your code may not fit a pattern that JIT can optimize *(well, even if you can do it with Java, bad code is always bad code)*. Other than that, if the *logic* of your bytecode is good, **you can do whatever you want. The JVM must do what the bytecode says.** – Andreas Apr 05 '17 at 21:40
@Andreas For example you can group all three `aload_0` opcodes together, your code will behave exactly the same but there is no Java code that results in that exact bytecode pattern. – biziclop Apr 05 '17 at 21:45
1

@biziclop You can't group by three `aload_0` up front, because you couldn't then call `getfield` and `invokevirtual` on them, only on the one at the top of the operand stack. – Andreas Apr 05 '17 at 21:48
@Andreas That means it's time for me to go to bed. :) – biziclop Apr 05 '17 at 21:49
@Andreas my concern mostly comes from the fact that in first case the bytecode utilizes local variable table a lot, keeping stack at small size, while the second approach does the opposite thing. I understand that my bytecode is valid. Anyway - it seems that mostly the difference relates to the shady topic of JIT optimization which is, well, shady anyway, even if I code on plain Java. Need to think about it for a while – skapral Apr 05 '17 at 21:50
1

@skapral There is no difference in operand stack size between the two (tops out at 5 for both). Please disregard that as an option for consideration. Even if there had been, it is minuscule, and will not cause trouble. Infinite recursion causes stack trouble. Not that. – Andreas Apr 05 '17 at 22:06
1

@biziclop: if you group the `aload_0` instructions up front, you would need `swap` instructions, so the result would [definitely differ from ordinary code](http://stackoverflow.com/q/9722421/2711488). Still, I don’t think that this is a challenge to any JIT. – Holger Apr 06 '17 at 10:17

Java bytecode: local variables table vs on-stack calculation

3 Answers3