I've been experimenting with Vector to use HW to parallelise integer arithmetic. Is there any way to enable overflow checking with vector operations?
One example is to add two columns (equal length arrays) of ints together. Here c=a+b
means c[0] = a[0] + b[0]
, c[1] = a[1] + b[1]
, etc.
I suppose I could do something like this:
overflow[i] = b[i] >= 0 ? c[i] < a[i] : c[i] >= a[i];
But this (branching) might be slower than .Net's automatic overflow checking, and might negate the performance benefit of using Vector<T>
.
We also want to optimise our most commonly used operations: multiplication, subtraction, to a lesser extent integer division.
Edit: I thought about this a little more, and came up with this, which is 2.5 times as slow as the unchecked vector addition. Seems like a lot of additional overhead.
public Vector<int> Calc(Vector<int> a, Vector<int> b)
{
var result = a + b;
var overflowFlag = Vector.GreaterThan(b, Vector<int>.Zero) * Vector.LessThan(result,a)
+ Vector.LessThan(b,Vector<int>.Zero) * Vector.GreaterThan(result, a);
// It makes no sense to add the flags to the result, but haven't decided what to do with them yet,
// and don't want the compiler to optimise the overflow calculation away
return result + overflowFlag;
}
Timings: (4k iterations adding a pair of 100k arrays)
- Normal Add: 618ms
- Normal Checked Add: 1092ms
- Vector Add: 208ms
- Vector Checked Add: 536ms