Can I do checked arithmetic with Vector

Question

I've been experimenting with Vector to use HW to parallelise integer arithmetic. Is there any way to enable overflow checking with vector operations?

One example is to add two columns (equal length arrays) of ints together. Here c=a+b means c[0] = a[0] + b[0], c[1] = a[1] + b[1], etc.

I suppose I could do something like this:

overflow[i] = b[i] >= 0 ? c[i] < a[i] : c[i] >= a[i];

But this (branching) might be slower than .Net's automatic overflow checking, and might negate the performance benefit of using Vector<T>.

We also want to optimise our most commonly used operations: multiplication, subtraction, to a lesser extent integer division.

Edit: I thought about this a little more, and came up with this, which is 2.5 times as slow as the unchecked vector addition. Seems like a lot of additional overhead.

    public Vector<int> Calc(Vector<int> a, Vector<int> b)
    {
        var result = a + b;
        var overflowFlag = Vector.GreaterThan(b, Vector<int>.Zero) * Vector.LessThan(result,a)
            + Vector.LessThan(b,Vector<int>.Zero) * Vector.GreaterThan(result, a);

        // It makes no sense to add the flags to the result, but haven't decided what to do with them yet, 
        // and don't want the compiler to optimise the overflow calculation away
        return result + overflowFlag;
    }

Timings: (4k iterations adding a pair of 100k arrays)

Normal Add: 618ms
Normal Checked Add: 1092ms
Vector Add: 208ms
Vector Checked Add: 536ms

Nope, but you can check it yourself of course. With more context I could give a more targeted recommendation. — harold, Sep 25 '16 at 21:46
@Harold - I added a little further context. Is there an efficient method for determining whether overflow occurred? — Rob, Sep 26 '16 at 09:46
You wouldn't have to branch though (actually you can't), you can use ConditionalSelect. Multiplication is tricky without widening and widening is annoying, I'll think about it. I'll be back later to answer more properly — harold, Sep 26 '16 at 09:59
Vector integer multiplication is fairly slow, you should prefer bitwise operations. — harold, Sep 26 '16 at 13:41
@Harold - excellent - that helps a lot. `BitwiseOr(BitwiseAnd(...), BitwiseAnd(...))`. Down from 536ms to 343ms (compared to 208ms for unchecked). — Rob, Sep 26 '16 at 15:45

score 1 · Accepted Answer · edited May 23 '17 at 12:19

Using some trickery borrowed from Hacker's Delight (chapter 2, section Overflow Detection), here are some overflow predicates (not tested):

Signed addition:

var sum = a + b;
var ovf = (sum ^ a) & (sum ^ b);

The result is in the signs, not full masks. Maybe that's enough, maybe not, in which case I would normally recommend a right shift but there's no right shift on Vector<T> (way too much stuff is missing). You could compare against zero though.

Unsigned addition: For completeness?

var sum = a + b;
var ovf = Vector.LessThan(sum, a);

Multiplication:

As far as I can tell there's just no reasonable way to do it. It's a bit annoying even in native SSE, but with pmuldq and some shuffling it's not too bad.
In C# SIMD this just seems hopeless. There is no high-mul (also missing from native SSE except for 16bit ints, also annoying), no widening multiplication (and no way to narrow the result anyway), also there is no reasonable way to widen in advance. Even if you could widen (could they please add that to the API, seriously), multiplying 64bit integers with SSE is annoying, so annoying that doing it with scalar arithmetic isn't a bad option, which defeats the point.

So I'm going to recommend not doing that in SIMD, not in C# at least.

That doesn't have to mean that you use the built-in overflow detection. While that is appropriate if overflow is a fatal error, it's disastrously slow if it's common and expected and you just want the overflow status in a boolean flag. In that case, you could use:

Signed multiplication:

long ext_prod = (long)a * b;
int prod = (int)ext_prod;
bool ovf = (prod >> 31) != (int)(ext_prod >> 32);

Unsigned multiplication:

ulong ext_prod = (ulong)a * b;
uint prod = (uint)ext_prod;
bool ovf = (ext_prod >> 32) != 0;

In SIMD it would have worked essentially the same way, ie testing whether the high half is filled with copies of the sign (by definition zero in the unsigned case), but the widening makes it annoying in native SIMD and hopeless in C# SIMD.

wouldn't it be quicker to check for overflow using `ext_prod == prod`? At least it seems to be in my quick benchmark. — Rob, Sep 29 '16 at 12:30

Can I do checked arithmetic with Vector

1 Answers1