why always gcc makes jle/jg?

Question

I make some assembly test code which just compare with character, gcc makes jle / jg combination always whether condition contains equal or not.

example 1.

if ( 'A' < test && test < 'Z' )

0x000000000040054d <+32>:   cmp    BYTE PTR [rbp-0x1],0x41

0x0000000000400551 <+36>:   jle    0x40056a <main+61>

0x0000000000400553 <+38>:   cmp    BYTE PTR [rbp-0x1],0x59

0x0000000000400557 <+42>:   jg     0x40056a <main+61>

example 2.

if ( 'A' <= test && test <= 'Z' )

0x000000000040054d <+32>:   cmp    BYTE PTR [rbp-0x1],0x40

0x0000000000400551 <+36>:   jle    0x40056a <main+61>

0x0000000000400553 <+38>:   cmp    BYTE PTR [rbp-0x1],0x5a

0x0000000000400557 <+42>:   jg     0x40056a <main+61>

I thought it's problem about optimization, but GCC gave same result even if I compile with -O0 option.

How can I get JL/JG through 'A'< sth<'Z' and JLE/JGE through 'A'<=sth<='Z'?

what exactly is the question? Why do you want different code if compiler provides you with fastest option anyway? — Severin Pappadeux, Aug 07 '15 at 15:47
Compiler basically transform second comparison into `if ( 'A'-1 < test && test < 'Z'+1 )` and generates the same code — Severin Pappadeux, Aug 07 '15 at 15:48
There's no problem with the optimization or the compiler. It's generating the correct code. You can't dictate what instructions the compiler uses to implement any given statement. It can use any instruction supported by the target CPU, so long as it give the correct result. — Ross Ridge, Aug 07 '15 at 15:59
Oh I'm so sorry, My explain was too short. I was expected gcc give JL/JG through 'A' — cheese choi, Aug 07 '15 at 16:14
@cheesechoi Ah I see what you mean. Well, that's probably due to the fact that JLE had additional check for ZF=1, thus this instruction depends on flag values properly set and updated. It means even JLE takes the same execution time as JL, it has more dependencies, thus making code potentially slower to execute and reduce instruction level parallelism. Clear choice for the compiler — Severin Pappadeux, Aug 07 '15 at 16:45
For why GCC might be making this change, see: http://stackoverflow.com/a/12146965/3826372 Even though this change makes no difference on x86 CPUs, it can be an optimization on other CPUs and doesn't hurt on x86. It also probably helps to normalize comparisons so other optimizations can be performed more easily. Eg to turn `'A' >= c || c >= 'C'` into 'c != 'B''. — Ross Ridge, Aug 07 '15 at 17:57
@SeverinPappadeux: Interesting point, gcc may have a heuristic to choose branch instructions that depend on as few flags as possible. In this case, `cmp` sets all the flags needed by the `jcc`. (In fact, in recent Intel designs, the insn decoders macro-fuse the `cmp / jcc` into a single compare-and-branch uop). I think Ross's link explains the historical reasons why this might be the case. — Peter Cordes, Aug 08 '15 at 03:33
@PeterCordes that might be the case, though real answer might be diving into gcc x86 cost calculation for jl/jle and check. Though last time I looked into GCC itnernals was XX years ago, on sparc... — Severin Pappadeux, Aug 08 '15 at 03:54

Severin Pappadeux · Accepted Answer · 2015-08-07T18:16:04.263

2

One can see that first comparison is against [x41...x59] range. Second comparison is against [x40...x5a] range. Basically, compiler makes it into

if ( 'A'-1 < test && test < 'Z'+1 )

and then generates the same code

UPDATE

Just to make clear why I think compiler prefers JL vs JLE. JLE depends on flag values being updated (ZF=1) but JL doesn't. Therefore, JLE will introduce dependencies which potentially could hurt instruction level parallelism, even if instruction timing itself is the same

So, clear choice - transform code to use simpler instructions.

edited Aug 07 '15 at 18:16

answered Aug 07 '15 at 15:55

Severin Pappadeux

18,636
3
38
64

What makes you think that JL doesn't depend on flags? For that matter, what makes you think that JL is "simpler"? – Stephen Canon Aug 07 '15 at 18:27
@StephenCanon sorry, was a bit unclear - JLE has an additional dependency. `For that matter, what makes you think that JL is "simpler"? ` Well, JLE has two conditions to check, see table http://stackoverflow.com/questions/9617877/assembly-jg-jnle-jl-jnge-after-cmp – Severin Pappadeux Aug 07 '15 at 18:41
Superficial complexity not withstanding, the actual performance of CMP + JL and CMP + JLE is absolutely identical, as are the dependencies. – Stephen Canon Aug 07 '15 at 19:06
@StephenCanon as a single instruction yes, I mentioned this. If they are absolutely equivalent then why do you think compiler applies transformation to conditional expression? It is for a reason it expands interval from [a...b] to [a-1...b+1], it didn't happen by itself out of blue moon – Severin Pappadeux Aug 07 '15 at 20:33
CMP writes to all of the flags. Thus both JL and JLE depend only on the result of CMP, and not on any previous state. The interval is expanded because the compiler canonicalizes comparisons, and happens to have picked that particular canonicalization, not because it's any faster one way or the other. – Stephen Canon Aug 07 '15 at 21:35
@StephenCanon: Ross linked http://stackoverflow.com/questions/12135518/is-faster-than/12146965#12146965 in comments on the OP, which may explain the historical reasons why gcc would make this transformation. (x86 wasn't gcc's first target architecture). – Peter Cordes Aug 08 '15 at 03:36

score 1 · Answer 2 · answered Aug 07 '15 at 16:02

In general, you can't force the compiler to emit a particular instruction. In this case, you might succeed if you get rid of the constant so the compiler won't be able to adjust it. Note that due to the nature of your expression, the compiler will probably still reverse one of the tests, and thus bring in an equals. You might be able to work around that by using goto. Obviously, both of these changes will generate worse code.

why always gcc makes jle/jg?

2 Answers2