x86 assembly abs() implementation?

Question

Is there an abs() function in x86 assembly language?

(This question originally mentioned getting the difference of 2 signed integers, but that's really a separate question if you need to do that without possible signed-overflow in the subtraction. Otherwise yes, just abs(x-y), possibly widening inputs first.)

What platform are you on? There's no such thing as "Assembly language", only "x86 assembly" or "ARM assembly", etc. — Stephen Canon, Apr 14 '10 at 16:33
How do I compare and conditionally swap, then subtract. Can you provide an example in x86.. — Greg C., Apr 14 '10 at 16:36
Is it a possibility to set the sign bit of both integers to 0? — Anatoly, Nov 23 '21 at 19:47
BTW, `unsigned abs(int x)` is a different question than an efficient / correct implementation of absolute difference of either signed or unsigned integers. For example, see [Compute the absolute difference between unsigned integers using SSE](https://stackoverflow.com/q/3380785) for some strategies that work on SIMD elements, could be done with scalar code. Or [Fast, branchless unsigned int absolute difference](https://stackoverflow.com/a/22459071) . taking min / max and subtracting seems good. For signed, widening both inputs, subtracting (so overflow is impossible), then abs, can work. — Peter Cordes, Apr 24 '22 at 12:20
But all the answers here are about a single abs() so let's just keep it that way. — Peter Cordes, Apr 24 '22 at 12:20

bits · Answer 1 · 2023-03-28T06:29:34.087

38

This is how the C library function abs() does it in assembly without branching:

   abs(x) = (x XOR y) - y

where y = x >> 31 (assuming 32-bit input), and >> is arithmetic right shift operator.

Explanation of the above formula: We want to generate 2's complement of negative x only.

y = 0xFFFFFFFF, if x is negative
    0x00000000, if x is positive

So when x is positive x XOR 0x00000000 is equal to x . And when x is negative x XOR 0xFFFFFFFF is equal to 1's complement of x. Now we just need to add 1 to get its 2's complement which is what expression -y is doing . Because 0xFFFFFFFF is -1 in decimal.

Let's look at assembly generated for following code by gcc (4.6.3 on my machine):

C code:

main()
{
  int x;
  int output = abs(x);
}

gcc 4.6.3 generated assembly snippet (AT&T syntax), with my comments:

  movl  -8(%rbp), %eax    # -8(%rbp) is memory for x on stack
  sarl  $31, %eax         #  shift arithmetic right: x >> 31, eax now represents y
  movl  %eax, %edx        #  
  xorl  -8(%rbp), %edx    #  %edx = x XOR y
  movl  %edx, -4(%rbp)    # -4(%rbp) is memory for output on stack
  subl  %eax, -4(%rbp)    # (x XOR y) - y

BONUS (from Hacker's Delight): If you have a fast multiply by +1 and -1, the following will give you abs(x) for 32-bit x:

      ((x >> 30) | 1) * x

edited Mar 28 '23 at 06:29

answered Jan 07 '13 at 11:14

bits

1,595
1
17
17

This works because the `y` in this example is basically sign of the input (0=positive or zero, 111....111=negative number). And `xor` with zero doesn't change the number. – Mikko Rantalainen Dec 20 '21 at 14:26
1

`0xFFFF` would be the `y` value for 16-bit signed arithmetic. 32-bit would be `0xFFFF_FFFF`. – ecm Jan 18 '22 at 13:14
Did you forget to enable optimization or something? Write a function that takes an `int` arg and returns `unsigned` so it can optimize without optimizing away. https://godbolt.org/z/asGnY4EEv - gcc uses `cdq` to get `x>>31` in EDX, while clang uses a `cmov` implementation that's better on modern x86 with single-uop cmov. – Peter Cordes Jan 18 '22 at 15:19
@PeterCordes I dont remember what command I used back then to generate asssemply (its been 9 years!), I'm curious to learn more about your idea - may be a new answer to demostrate it? – bits Jan 20 '22 at 08:25
You mean looking at asm output for functions that take args and return values? Already done 6 years ago, on [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116). Or if you mean that `cdq` trick in asm, [Mark Wilkins' answer](https://stackoverflow.com/questions/2639173/x86-assembly-abs-implementation/2639219#2639219) covers it. (But Hal's answer with cmovl is even better, if used with `xor`-zeroing and `sub` instead of `mov`+`neg`.) – Peter Cordes Jan 20 '22 at 08:51
Or if you mean the C mechanics of writing an `abs` that can return (unsigned)INT_MIN without overflow, yeah it's annoying and lame that C `abs()` returns signed `int`. You need trickery like `x < 0 ? 0U - x ? x; – Peter Cordes Jan 20 '22 at 08:56
1

Interesting that you would use `>>>` as the arithmetic right shift. That the opposite of the JavaScript language where `>>` is the arithmetic right shift and `>>>` is the unsigned (logical) right shift operator. – Alexis Wilke Mar 26 '23 at 17:38
1

Ah good catch @AlexisWilke . It makes sense to use the right syntax, I dont even remember why I had used `>>>` - but I have fixed it since all (or almost all) general purpose languages uses `>>` – bits Mar 28 '23 at 06:30

score 29 · Answer 2 · answered Aug 13 '12 at 03:41

29

Old thread but if I surfed in here late you might have too... abs is a brilliant example so this should be here.

; abs(eax), with no branches.
; intel syntax (dest, src)

mov ebx, eax ;store eax in ebx
neg eax
cmovl eax, ebx ;if eax is now negative, restore its saved value

answered Aug 13 '12 at 03:41

Hal

291
3
2

2

This is really simple & efficient by avoiding `branch predictor`, definitely should be accepted as answer. – Eric Jun 07 '16 at 04:39
`xor ebx,ebx` / `sub ebx,eax` takes the `mov` off the critical path, and thus is better on some CPUS (including Ice Lake, where mov-elimination is broken). On Sandybridge-family, xor-zeroing is as cheap as a NOP, like an eliminated MOV, and even on other CPUs where it still needs a back-end execution port, it has no input dependencies so can run any time ahead of the input value being ready. Also, for an int64_t version, it avoids a REX prefix for the first instruction, while the rest use rax/rbx. – Peter Cordes Jan 18 '22 at 15:14
BTW, this is what clang does (and what GCC should be doing). See [this](https://stackoverflow.com/a/61179064/224132) and [this](https://stackoverflow.com/questions/52571833/x86-absolute-value-cofusion) for two examples of clang and GCC output, with GCC using a 2's complement bithack identity, clang using `neg`/`cmov`. Neither using `xor`-zero / `sub` / `cmov`, unfortunately. – Peter Cordes Apr 24 '22 at 12:30
1

Just compared this to gcc-9. This was actually faster by a clearly measurable amount! – Charles Lohr Dec 15 '22 at 09:12

score 20 · Answer 3 · answered Apr 14 '10 at 16:36

20

If it is x86 assembly, the following according to the ever useful wikipedia should work. Subtract one value from the other and then use these instructions on the result:

cdq
xor eax, edx
sub eax, edx

answered Apr 14 '10 at 16:36

Mark Wilkins

40,729
5
57
110

Stephen Canon · Answer 4 · 2010-04-14T17:10:09.353

If you want to handle all cases correctly, you can't just subtract and then take the absolute value. You will run into trouble because the difference of two signed integers is not necessarily representable as a signed integer. For example, suppose you're using 32 bit 2s complement integers, and you want to find the difference between INT_MAX (0x7fffffff) and INT_MIN (0x80000000). Subtracting gives:

0x7fffffff - 0x80000000 = 0xffffffff

which is -1; when you take the absolute value, the result you get is 1, whereas the actual difference between the two numbers is 0xffffffff interpreted as an unsigned integer (UINT_MAX).

The difference between two signed integers is always representable as an unsigned integer. To get this value (with 2s complement hardware), you just subtract the smaller input from the larger and interpret the result as an unsigned integer; no need for an absolute value.

Here's one (of many, and not necessarily the best) way do this on x86, assuming that the two integers are in eax and edx:

    cmp   eax,  edx  // compare the two numbers
    jge   1f
    xchg  eax,  edx  // if eax < edx, swap them so the bigger number is in eax
1:  sub   eax,  edx  // subtract to get the difference

Using `jge` might cause the `branch predictor` in cpu to `mis-prediction`, that will slow down the cpu dramatically. So, if performance is in concern, it's better to use the answer from @bits or @Hal — Eric, Jun 07 '16 at 04:37

score 6 · Answer 5 · answered Apr 14 '10 at 16:52

Assuming that your integers are in MMX or XMM registers, use psubd to compute the difference, then pabsd to get the absolute value of the difference.

If your integers are in the plain, "normal" registers, then do a subtraction, then the cdq trick to get the absolute value. This requires using some specific registers (cdq sign-extends eax into edx, using no other register) so you may want to do things with other opcodes. E.g.:

mov  r2, r1
sar  r2, 31

computes in register r2 the sign-extension of r1 (0 if r1 is positive or zero, 0xFFFFFFFF if r1 is negative). This works for all 32-bit registers r1 and r2 and replaces the cdq instruction.

score 5 · Answer 6 · answered Apr 15 '10 at 23:13

5

A short but straightforward way, using the conditional move instruction (available Pentium and up I think):

; compute ABS(r1-r2) in eax, overwrites r2
mov eax, r1
sub eax, r2
sub r2, r1
cmovg eax, r2

The sub instruction sets the flags the same as the cmp instruction.

answered Apr 15 '10 at 23:13

Callum

862
5
13

1

cmov was new with P6 (ppro/PII), but yes you can assume it these days. gcc does. – Peter Cordes Nov 20 '15 at 01:38

score 2 · Answer 7 · edited Nov 20 '15 at 01:38

2

ABS(EAX)

  test   eax, eax   ;  Triger EFLAGS [CF, OF, PF, SF, and ZF]
  jns AbsResult     ;  If (SF) is off, jmp AbsResult
  neg    eax        ;  If (SF) is on. (negation nullify by this opcode)
AbsResult:

If the flags are already set by whatever generated the value in eax, you don't need the test. Branch mispredicts will make this slow if the input values are randomly distributed between positive and negative.

This works the same way for RAX, AX, AL.

edited Nov 20 '15 at 01:38

Peter Cordes

328,167
45
605
847

answered Apr 09 '14 at 21:14

Pr0c3ss0r

129
1
3

3

`or reg, reg` is always a worse choice than `test reg,reg`. http://stackoverflow.com/questions/33721204/x86-assembly-cmp-reg-0-vs-or-reg-reg/33724806#33724806. Also, branches aren't "one clock". They're either ~zero (predicted correctly), or ~15 clocks (mispredict). – Peter Cordes Nov 20 '15 at 01:32

score 0 · Answer 8 · edited Apr 25 '22 at 01:07

0

8 chars at once

inline u64 abs_8(u64 x)
{
    u64 y=(x>>7)&0x0101010101010101ull;
    return (x^((y<<8)-y))+y;
}

edited Apr 25 '22 at 01:07

Blastfurnace

18,411
56
55
70

answered Jan 18 '22 at 11:48

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 18 '22 at 14:37
1

In x86 assembly, we'd normally use [SSSE3 `pabsb`](https://www.felixcloutier.com/x86/pabsb:pabsw:pabsd:pabsq) to do 16 bytes in parallel with one instruction. But there are probably still some CPUs around without SSSE3, like AMD Phenom II being the newest to *not* have it. So your bithack might be useful there, although you might just use `psubb` / `pcmpgtb` / blend. Or if you actually need absolute *differences* like the question body asks for, you'd use `psadbw` to add absolute differences within 8-byte chunks. – Peter Cordes Jan 18 '22 at 15:24
I was thinking `psubb` from zero to negate then `pmaxsb` to keep the positive element (`abs(x) = max(x, -x)`) would work to avoid blend, but `pmaxsb` is SSE4.1. Possibly SSE2 `pminub` to keep the element *without* the sign-bit set? Yeah that should work, too, including for the special cases 0 and `-128`. (Since it's its own 2's complement inverse, `minub(128u, 128u) = -128 = 128u`) – Peter Cordes Apr 25 '22 at 01:42
So yes, `__m128i vx = _mm_cvtsi64_si128(x)` ; `return _mm_min_epu8(vx, _mm_sub_epi8(_mm_setzero_si128(), vx);`. https://godbolt.org/z/GdanTcqGd confirms clang even optimizes it into pabsb if available, so which means it must be correct! So this bithack isn't useful in x86-64 assembly, even if you only have baseline SSE2. Except possibly in kernel code where you can only use general-purpose registers. – Peter Cordes Apr 25 '22 at 01:48

score -2 · Answer 9 · answered Apr 14 '10 at 16:38

-2

There is the SUB instruction, if what you want is to do A-B. HTH

answered Apr 14 '10 at 16:38

Aidenn

559
1
4
12

x86 assembly abs() implementation?

9 Answers9

Linked