Reverse engineering math functions

Question

I am translating some assembly code to C code, but having trouble with the following code:

mov    $0x51eb851f,%edx
mov    %ecx,%eax
imul   %edx
sar    $0x5,%edx
mov    %edx,%edi
mov    %ecx,%eax
sar    $0x1f,%eax
sub    %eax,%edi
imul   $0x64,%edi,%eax
sub    %eax,%ecx

%ecx stores our parameter which is an int type (we can call it x). I understand that the first three steps are actually doing x / 100. But I got confused in the following steps.

Some help would be appreciated.

`int foo(int x) {return x%100;}` though the fragment you have posted doesn't conform to typical x86 ABIs, so it's probably inline or just part of some expression. — EOF, Apr 16 '20 at 08:12

score 3 · Accepted Answer · answered Apr 16 '20 at 08:15

3

gcc 9.3.1 for x86 gcc -O3 -S -m32 converts the following C-code:

int foo(int x)
{
  return x%100;
}

into the following assembly:

foo:
    movl    4(%esp), %ecx
    movl    $1374389535, %edx
    movl    %ecx, %eax
    imull   %edx
    movl    %edx, %eax
    movl    %ecx, %edx
    sarl    $5, %eax
    sarl    $31, %edx
    subl    %edx, %eax
    imull   $100, %eax, %eax
    subl    %eax, %ecx
    movl    %ecx, %eax
    ret

which is effectively identical except for some minor reordering and conforming to x86 ABI for a whole function.

answered Apr 16 '20 at 08:15

EOF

6,273
2
26
50

I still got confused about the 'sar' instructions. What role actually are these two 'sar' playing in the function. – DarthZ Apr 16 '20 at 08:47
The `sarl $5, %eax` is part of the division by multiplication with the multiplicative inverse. The `sarl $31, %edx` converts `%edx` to `-1` if it was negative before, `0` if it was non-negative. Then `%edx` `-1` is then subtracted from `%eax`, which is equivalent to adding `1` or `0` for negative or positive respectively. This is needed to make sure the division is truncating (rounding towards 0), rather than rounding towards negative infinity (arithmetic right shift rounds towards negative infinity, consider `(-1 >> 1) == -1`). – EOF Apr 16 '20 at 09:33

score 1 · Answer 2 · answered Apr 16 '20 at 09:34

You are right, the first three steps are doing x / 100 -- more or less by multiplying by 1/100 -- but the division is not complete until after the sub %eax, %edi.

So, to answer your question about the steps which follow the first three, here is the code fragment, annotated:

  mov    $0x51eb851f,%edx   # magic multiplier for signed divide by 100
  mov    %ecx,%eax          # %ecx = x
  imul   %edx               # first step of division signed x / 100
  sar    $0x5,%edx          # q = %edx:%eax / 2^(32+5)
  mov    %edx,%edi          # %edi = q (so far)
  mov    %ecx,%eax
  sar    $0x1f,%eax         # %eax = (x < 0) ? -1 : 0
  sub    %eax,%edi          # %edi = x / 100 -- finally
  imul   $0x64,%edi,%eax    # %eax = q * 100
  sub    %eax,%ecx          # %ecx = x - ((x / 100) * 100)

Noting:

typically, in this technique for divide-by-multiplying, the multiply produces a result which is scaled up by 2^(32+n) (for 32-bit division). In this case, n = 5. The full result of the multiply is %edx:%eax, and discarding %eax divides by 2^32. The sar $05, %edx divides by the 2^n -- since this is a signed division, it requires an arithmetic shift.
sadly, for signed division the shifted %edx is not quite the quotient. If the dividend is -ve (and given that the divisor is +ve) need to add 1 to to get the quotient. So sar $0x1f, %eax gives -1 if x < 0, and 0 otherwise. And the sub %eax, %edi completes the division.

This step could equally be achieved by shr $0x1f, %eax and add %eax, %edi. Or add %eax, %eax and adc $0, %edi. Or cmp $0x80000000, %ecx and sbb $-1, %edi -- which is my favourite, but sadly saving the mov %ecx, %eax saves nothing these days, and in any case cmp $0x80000000, %ecx is a long instruction :-(
it's not clear why this shuffles the quotient q to %edi -- if it was left in %edx it would still be there after imul $0x64,%edx,%eax.

*saving the mov %ecx, %eax saves nothing these days* - Not true, it does save 1 uop of front-end bandwidth, and space in the ROB (out-of-order window size). e.g. [Can x86's MOV really be "free"? Why can't I reproduce this at all?](https://stackoverflow.com/q/44169342) - it's not free, only the latency and back-end uop are eliminated. Neat trick with cmp/sbb; might be worth doing that on Broadwell and later, and AMD, where `sbb` is single-uop. — Peter Cordes, Apr 16 '20 at 14:55

Reverse engineering math functions

2 Answers2