3

Do you know any way to add with saturation 32-bit signed words using MMX/SSE assembler instructions? I can find 8/16 bits versions but no 32-bit ones.

phuclv
  • 37,963
  • 15
  • 156
  • 475
LooPer
  • 1,459
  • 2
  • 15
  • 24
  • See [Agner Fog's vectorclass library](http://www.agner.org/optimize/#vectorclass) for an implementation of add and subtract with C++ intrinsics. A copy of the GPLed source [is here](https://github.com/pcordes/vectorclass/blob/77522287e64da5e887d69659e144d2caa5d3a4f1/vectori128.h#L2189), using XOR to check for same / different signs, and shifts / PANDN / PADDD to fix up the result. – Peter Cordes Nov 24 '16 at 04:22

2 Answers2

2

You can emulate saturated signed adds by performing the following steps:

int saturated_add(int a, int b)
{
    int sum = a + (unsigned)b;                // avoid signed-overflow UB
    if (a >= 0 && b >= 0)
        return sum > 0 ? sum : INT32_MAX;     // catch positive wraparound
    else if (a < 0 && b < 0)
        return sum > 0 ? INT32_MIN : sum;     // catch negative wraparound
    else
        return sum;                           // sum of pos + neg always fits
}

Unsigned, it's even simpler, see this stackoverflow posting

In SSE2, the above maps to a sequence of parallel compares and AND/ANDN operations. No single operation is available in hardware, unfortunately.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
FrankH.
  • 17,675
  • 3
  • 44
  • 63
  • 2
    [Bitwise saturated addition in C (HW)](https://stackoverflow.com/q/5277623) could probably vectorize better, with a couple `pxor` for `sum^a` and `sum^b`, and `pcmpgt(0, v)` or `psrad` – Peter Cordes Oct 08 '22 at 22:03
1

Saturated unsigned subtraction is easy, because for `a -= b', we can do

    asm (
        "pmaxud %1, %0\n\t" // a = max (a,b)
        "psubd %1, %0" // a -= b
        : "+x" (a)
        : "xm" (b)
    );

with SSE.

I was looking for unsigned addition, but possibly, the only way is to transform to a saturated unsigned subtraction, perform it, and transform back. Same for signed variants.

EDIT: with unsigned addition, you get min (a, ~b) + b this way, which of course works. With signed addition and subtraction, you have two saturation boundaries, which makes things complicated.

Michiel
  • 21
  • 2