3

this is the assembly code i am supposed to translate: f1:

subl    $97, %edi
xorl    %eax, %eax
cmpb    $25, %dil
setbe   %al
ret

heres the c code I wrote that I think is equivalent.

int f1(int y){

  int x = y-97;
  int i = 0;

  if(x<=25){
    x = i;
  }
  return x;
}

and heres what I get from compiling the C code.

_f1: ## @f1

.cfi_startproc

%bb.0:

pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq    %rsp, %rbp
.cfi_def_cfa_register %rbp
                  ## kill: def %edi killed %edi def %rdi
leal    -97(%rdi), %ecx
xorl    %eax, %eax
cmpl    $123, %edi
cmovgel %ecx, %eax
popq    %rbp
retq
.cfi_endproc

I was wondering if this was correct / what should be different and if anyone could help explain how jmps work as I am also trying to translate this assembly code and have gotten stuck f2:

cmpl    $1, %edi
jle .L6
movl    $2, %edx
movl    $1, %eax
jmp .L5

.L8:

movl    %ecx, %edx

.L5:

imull   %edx, %eax
leal    1(%rdx), %ecx
cmpl    %eax, %edi
jg  .L8

.L4:

cmpl    %edi, %eax
sete    %al
movzbl  %al, %eax
ret

.L6:

movl    $1, %eax
jmp .L4
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 3
    Note that `cmpb` is unsigned comparison. Also, the asm version returns 0 or 1, your C version doesn't. – Jester Mar 03 '19 at 23:32
  • 4
    One possibility is: `int f1(int y){ unsigned char x = y-97; return (x <= 25); }` .To see the assembly generated you can try it on [Godbolt](https://godbolt.org/z/1TrOcr) – Michael Petch Mar 03 '19 at 23:41
  • Thanks that site looks super useful for testing thanks –  jpreed499 Mar 03 '19 at 23:49
  • 1
    I'm voting to close this question as off-topic because stack exchange is not a reverse engineering service nor an optimization service. – Joshua Mar 04 '19 at 02:44

2 Answers2

4

gcc8.3 -O3 emits exactly the asm in the question for this way of writing the range check using the unsigned-compare trick.

int is_ascii_lowercase_v2(int y){
    unsigned char x = y-'a';
    return x <= (unsigned)('z'-'a');
}

Narrowing to 8-bit after the int subtract matches the asm more exactly, but it's not necessary for correctness or even to convince compilers to use a 32-bit sub. For unsigned char y, the upper bytes of RDI are allowed to hold arbitrary garbage (x86-64 System V calling convention), but carry only propagates from low to high with sub and add.

The low 8 bits of the result (which is all the cmp reads) would be the same with sub $'a', %dil or sub $'a', %edi.

Writing it as a normal range-check also gets gcc to emit identical code, because compilers know how optimize range-checks. (And gcc chooses to use 32-bit operand-size for the sub, unlike clang which uses 8-bit.)

int is_ascii_lowercase_v3(char y){
    return (y>='a' && y<='z');
}

On the Godbolt compiler explorer, this and _v2 compile as follows:

## gcc8.3 -O3
is_ascii_lowercase_v3:    # and _v2 is identical
    subl    $97, %edi
    xorl    %eax, %eax
    cmpb    $25, %dil
    setbe   %al
    ret

Returning a compare result as an integer, instead of using an if, much more naturally matches the asm.

But even writing it "branchlessly" in C won't match the asm unless you enable optimization. The default code-gen from gcc/clang is -O0: anti-optimize for consistent debugging, storing/reloading everything to memory between statements. (And function args on function entry.) You need optimization, because -O0 code-gen is (intentionally) mostly braindead, and nasty looking. See How to remove "noise" from GCC/clang assembly output?

## gcc8.3 -O0
is_ascii_lowercase_v2:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    %edi, -20(%rbp)
    movl    -20(%rbp), %eax
    subl    $97, %eax
    movb    %al, -1(%rbp)
    cmpb    $25, -1(%rbp)
    setbe   %al
    movzbl  %al, %eax
    popq    %rbp
    ret

gcc and clang with optimization enabled will do if-conversion to branchless code when it's efficient. e.g.

int is_ascii_lowercase_branchy(char y){
    unsigned char x = y-'a';
    if (x < 25U) { 
        return 1;
    }
    return 0;
}

still compiles to the same asm with GCC8.3 -O3

is_ascii_lowercase_branchy:
    subl    $97, %edi
    xorl    %eax, %eax
    cmpb    $25, %dil
    setbe   %al
    ret

We can tell that the optimization level was at least gcc -O2. At -O1, gcc uses the less efficient setbe / movzx instead of xor-zeroing EAX ahead of setbe

is_ascii_lowercase_v2:
    subl    $97, %edi
    cmpb    $25, %dil
    setbe   %al
    movzbl  %al, %eax
    ret

I could never get clang to reproduce exactly the same sequence of instructions. It likes to use add $-97, %edi, and cmp with $26 / setb.

Or it will do really interesting (but sub-optimal) things like this:

# clang7.0 -O3
is_ascii_lowercase_v2:
    addl    $159, %edi    # 256-97 = 8-bit version of -97
    andl    $254, %edi    # 0xFE; I haven't figured out why it's clearing the low bit as well as the high bits
    xorl    %eax, %eax
    cmpl    $26, %edi
    setb    %al
    retq

So this is something involving -(x-97), maybe using the 2's complement identity in there somewhere (-x = ~x + 1).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
4

Here is an annotated version of the assembly:

# %edi is the first argument, we denote x
subl $97, %edi
# x -= 97

# %eax is the return value, we denote y
xorl %eax, %eax
# y = 0

# %dil is the least significant byte (lsb) of x
cmpb $25, %dil

# %al is lsb(y) which is already zeroed
setbe %al
# if lsb(x) <= 25 then lsb(y) = 1
# setbe is unsigned version, setle would be signed

ret
# return y

So a verbose C equivalent is:

int f(int x) {
  int y = 0;
  x -= 97;
  x &= 0xFF; // x = lsb(x) using 0xFF as a bitmask
  y = (unsigned)x <= 25; // Section 6.5.8 of C standard: comparisons yield 0 or 1
  return y;
}

We can shorten it by realizing y is unnecessary:

int f(int x) {
  x -= 97;
  x &= 0xFF;
  return (unsigned)x <= 25;
}

The assembly of this is an exact match on Godbolt Compiler Explorer (x86-64 gcc8.2 -O2): https://godbolt.org/z/fQ0LVR

  • 2
    `(unsigned)x <= 25U;` is still a 32-bit comparison. If you want to more exactly describe the asm using C, you should cast to `unsigned char` instead of letting the optimizer do that to implement `&= 0xFF`. Of course, C doesn't actually have narrow operations; `unsigned char < unsigned char` promotes both sides to `int` (with zero-extension to preserve the value), according to the default integer promotions. – Peter Cordes Mar 04 '19 at 03:53