1

I'm implementing a deferred refinement operation for my numerical static analysis framework, specifically, integers and longs. For reference, int (and smaller) can be compared directly, no need for a cmp operation. However, I have noticed that the bytecode generated (at least for Java 11 targeting 1.8) for binary comparisons flips the comparison operation. For example, the source code may have if (i < j)... which is converted to a bytecode comparison of if_icmpge. For long types, the story is similar, but instead the comparison is punted to lcmp and simply use ifge to compare the result.

I suspect this is a cheap optimization but I would like to confirm. For example, by negating the condition, the false branch is now under a conditional jump instead of the true branch.

But the main question is: can I rely on this behavior of the compiler? If I'm given a conditional jump instruction with >= as the operator, can I always refine my abstract domain under the premise that the true branch is <?

Short aside: I'll be happy if someone can answer the behavior question. In writing this, I'm not sure it matters for me.

Edit

I seem to do this every time I'm in this section of the code base I'm working on.

The refinement must attach to the actual condition and not flip. Since, for example, the if_icmpge from the accepted answer is associated with a "branch out" block, the refinement should associate with the "branch out" values (e.g., no flip). Conversely, the "fall through" will be computed by "rotating" the comparison to its source original value.

Edit 2

I'm developing a numerical abstract interpretation framework, currently using intervals, zones, and a predicate based abstract domains. As such, what happens at runtime is does not particularly concern me.

(There's a lot to this and I'm undoubtedly leaving out details. I don't want the question to evolve into an introduction to data-flow static analysis.)

With respect to the proper interpretation of the flipped operators, my confusion was on the particular branch that was being flipped. Both the true and false branch of a condition are explored per this analysis. However, as noted and answered, the false branch is generated behind a conditional jump and the true branch simply falls through (the failure of the conditional jump). My question was predicated on missing the jump.

kballou
  • 68
  • 1
  • 1
  • 10
  • 4
    Barely anything the `javac` compiler does is for the purposes of "optimization". Virtually *all* optimization in the Java world happens in the runtime. I would expect this kind of thing to be done for simplicity/consistency. – Joachim Sauer Nov 28 '22 at 21:07
  • Maybe not all CPUs have a less than/greater than operations. Maybe some only have less than or equal to/greater than or equal to. – Bohemian Nov 28 '22 at 21:16
  • 3
    I would expect this to be an optimization, actually, but strictly for simplicity and conciseness of bytecode -- not for performance. This is likely to get _performance_ optimized in whichever direction at runtime, which is unlikely to care which "direction" javac put the condition. – Louis Wasserman Nov 28 '22 at 21:25
  • You can also, of course, review the [source code](https://github.com/openjdk/jdk/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/jvm/Gen.java#L1669) of javac that makes this decision. That said, I certainly would _not_ consider this a specified or reliable behavior -- not that it's clear what you mean by "refine my abstract domain." – Louis Wasserman Nov 28 '22 at 21:33
  • _"can I rely on this behavior of the compiler"_ - You can rely only on the compiler producing something that is semantically equivalent to what you coded. You cannot rely on anything else, and the generated code can change based on the Java version or command-line options. Also, the code might get completely rewritten at runtime by the optimizer. It might help if you explained why you think this is important. – Jim Garrison Nov 28 '22 at 21:50
  • Thank you @JimGarrison. I've updated the question to (hopefully) better reflect current understandings. Summary: the behavior of runtime is not relevant given my context in static analysis. Furthermore, my question was confused by the conditional jump. If the compiler maintains the semantics but flips the condition to achieve a different flow, I must still base my interpretation on the parsed flow. – kballou Nov 28 '22 at 22:42
  • 1
    OK, but the clarification still ignores that the JIT optimizer is free to do whatever it wants when converting JVM opcodes to machine instructions, as long as semantics are preserved. The resulting optimized flow could look nothing like the flow implied in the opcodes. I still fail to understand why this is important to you. If you need the flow to be analyzable you need to work at the machine instruction (assembler) level. – Jim Garrison Nov 29 '22 at 00:15
  • @JimGarrison: The analysis is branch sensitive but not flow sensitive. I must analyze each branch independently, but afterwards, both flows are merged together. I wanted to make sure I entered into the correct branches with the correct refinement on the conditions. – kballou Nov 29 '22 at 17:42
  • Since the question is marked duplicate, I'm inclined to delete the question even though this has been a fruitful (to me, at least) discussion. – kballou Nov 29 '22 at 17:43
  • 1
    @kballou duplicates are not bad. People may use different phrases and terms when searching, so the duplicates may help them finding the solution. The purpose of closing is just to prevent people from spending too much effort in writing redundant answers. – Holger Nov 30 '22 at 08:31

1 Answers1

3

Consider the following Java code:

class Test
{
    public static void main(String[] args)
    {
        System.out.println("a");
        if(args.length < 3)
        {
            System.out.println("b");
        }
        System.out.println("c");
    }
}

My Java compiler (javac 17.0.2) compiles this to:

public static void main(java.lang.String[]);
  Code:
     0: getstatic     #7                  // Field java/lang/System.out:Ljava/io/PrintStream;
     3: ldc           #13                 // String a
     5: invokevirtual #15                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
     8: aload_0
     9: arraylength
    10: iconst_3
    11: if_icmpge     22
    14: getstatic     #7                  // Field java/lang/System.out:Ljava/io/PrintStream;
    17: ldc           #21                 // String b
    19: invokevirtual #15                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
    22: getstatic     #7                  // Field java/lang/System.out:Ljava/io/PrintStream;
    25: ldc           #23                 // String c
    27: invokevirtual #15                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
    30: return

What you will notice here is that there is only a single branch, and it's the if_icmpge. That is, the control flow graph looks like this:

Basic blocks control flow graph

If this were an if_icmplt, then there would either have to be an unconditional branch after it, or the block that it jumps to would require an unconditional branch to jump back. This way, the conditional block can just "fall through" back to the shared execution path.

Whether this is a matter of optimisation or simplicity or both can be debated, but doing it the other way round would make the bytecode larger and more complex while likely not offering any advantage.

As to whether you can consider if_icmpge to be truly a <: if it simply skips a block that itself falls back on the shared execution path, then yes. But in the general case is depends on the control flow graph. If you have both an if and an else, either way round would be valid. If you have a chained if/else if/else, then it'd be nicer to decode it that way, but it'd still be valid to decode it as if { if/else } / else.

Siguza
  • 21,155
  • 6
  • 52
  • 89
  • If anything this leaves me with further questions: how would a `if_icmplt` be any less able to "fall through" in the exact same manner as `if_icmpge`? If I were to make a guess at this question, I would've assumed that it was simply a matter of the compiler writer choosing one for the assembly. – Rogue Nov 28 '22 at 21:44
  • @Rogue to be clear, I'm talking about the same condition expressed with a different opcode. In that case, the branch would be taken if the additional code is to be run, so you're left at the next instruction if it is _not_ to be run. You cannot fall through to that location from wherever the other case jumped to. You need to add an unconditional branch somewhere now. – Siguza Nov 28 '22 at 22:25
  • I think flipping the demo to `> 3` answers the question more clearly. – shmosel Nov 28 '22 at 22:47