No overflow exception thrown for long/ulong SSE addition in C#?

Question

For non-SSE code, as answered in the following question (No overflow exception for int in C#?) adding a checked section around addition, throws an overflow exception when adding Int64.MaxValue and 1. However, surrounding SSE addition with checked section does not seem to throw an overflow exception for long[] arrLong = new long[] { 5, 7, 16, Int64.MaxValue, 3, 1 };. I believe most SSE instructions use saturate math, where they reach Int64.MaxValue and don't go past it and never wrap around to negatives. Is there any way in C# to throw an overflow exception for SSE addition, or is it not possible because the CPU may not support raising a an overflow flag?

The code below shows my C# SSE implementation of summation of long[] using SSE. The result is a negative number for the array above, as positives wrap around and do not saturate, as C# must be using that version of the SSE instruction (as there are two versions: one that wraps and one that saturates). Don't know if C# allows developers to choose which version to use. Only the serial code portion of the code below throws an overflow exception, but the SSE portion doesn't.

    using System.Numerics;

    private static long SumSseInner(this long[] arrayToSum, int l, int r)
    {
        var sumVector = new Vector<long>();
        int sseIndexEnd = l + ((r - l + 1) / Vector<long>.Count) * Vector<long>.Count;
        int i;
        for (i = l; i < sseIndexEnd; i += Vector<long>.Count)
        {
            var inVector = new Vector<long>(arrayToSum, i);
            checked
            {
                sumVector += inVector;
            }
        }
        long overallSum = 0;
        for (; i <= r; i++)
        {
            checked
            {
                overallSum += arrayToSum[i];
            }
        }
        for (i = 0; i < Vector<long>.Count; i++)
        {
            checked
            {
                overallSum += sumVector[i];
            }
        }
        return overallSum;
    }

I'm not aware of any C# language feature that specifically supports SSE. You should be more specific in your question about what you're using to implement SSE operations/data types. That said, if it's true that SSE instructions don't overflow but instead always just get pinned to some max or min value (unfortunately, it's been a couple of decades since I did anything with SSE and I recall practically nothing about it), then of course there won't be any way to detect overflow. It's also not clear why, if you're dealing with 64-bit data, you'd expect the array init above to overflow anyway. — Peter Duniho, Jul 04 '19 at 19:41
@PeterDuniho, see here: https://devblogs.microsoft.com/dotnet/using-net-hardware-intrinsics-api-to-accelerate-machine-learning-scenarios/ or here https://fiigii.com/2019/03/03/Hardware-intrinsic-in-NET-Core-3-0-Introduction/ — , Jul 04 '19 at 19:41
@elgonzo: why? I don't see anything in that article that suggests any sort of built-in support in C# for SSE. — Peter Duniho, Jul 04 '19 at 19:44
As far as i remember, AVX (not SSE, strictly speaking) has operations that are either saturating or overflowing. I don't remember any AVX feature that would treat/enable overflow as an error condition... — , Jul 04 '19 at 19:44
@PeterDuniho, i guess you can make that argument for all the countless questions here on SO asking about methods or types of .NET class libraries but talking, referring and tagging specifically C#, as if .NET class library members were having built-in support in C#... — , Jul 04 '19 at 19:46
@elgonzo: no, that's not true. the tag is useful for e.g. when the OP wants an answer expressed specifically in C#, and/or wants to make clear that the context is C#. But this question is _specifically_ about an overflow exception thrown by C# code, which is a C# language feature. But if the C# language itself doesn't have any SSE-specific support, why would we expect it to generate overflow exceptions for SSE code? C# isn't handling any of the SSE operations...those only occur when the data's delivered to the library that itself is supporting it. — Peter Duniho, Jul 04 '19 at 19:49
@PeterDuniho, ah okay. Got it now what you were trying to say in your 1st comment. Sorry about the misunderstanding... — , Jul 04 '19 at 19:51
@elgonzo: no problem. Frankly, it would help if the OP would clarify his scenario. Given the lack of SSE features in C# itself, obviously he's using _something_ else to access SSE features, and there are multiple possibilities for that. Depending on the specifics, maybe such a "something" _could_ in fact throw an exception for overflow (depending on mode and/or operation). But there's not enough detail here to know whether that's the case or not. — Peter Duniho, Jul 04 '19 at 19:53
_"C# must be using that version of the SSE instruction"_ -- C# isn't using SSE for C#-define operations at all. You seem to be using a type named `Vector`, which is presumably provided by some SSE-aware library. It's that type you need to investigate, not C#. — Peter Duniho, Jul 04 '19 at 22:30
Vector is C# support for SSE instructions. It's a light abstraction provided by the System.Numerics standard C# library. It's a great way to abstract variability between CPUs, some supporting SSE, some SSE2, some AVX, and some AVX2. Vector is a nice way to abstract the width of your particular CPUs SSE capabilities and make it generic. — DragonSpit, Jul 04 '19 at 22:47
*I believe most SSE instructions use saturate math* No. Most SSE instructions are normal wrapping binary math, for add/sub/multiply/shift with element widths from 1 to 8 bytes. ([`paddb/w/d/q`](https://www.felixcloutier.com/x86/paddb:paddw:paddd:paddq)). There *are* signed and unsigned saturating versions available for add/sub, but only for [8 and 16-bit elements](https://www.felixcloutier.com/x86/paddusb:paddusw)), and saturating pack from 32 to 16, and 16 to 8, but that's all. There's also saturation in the horizontal add in [`PMADDUBSW`](https://www.felixcloutier.com/x86/pmaddubsw). — Peter Cordes, Jul 04 '19 at 23:54
Anyway, there's no efficient hardware way to detect signed overflow in most SIMD integer operations. If a language wants checked math, if could of course emulate it with a pcmpeq and branch, but that's more overhead than the scalar equivalent where a language could insert `intO` or `jo` instructions after scalar `add` instructions that set the OF flag on signed overflow. Presumably C#'s `checked` thing only applies to scalar math, not the intrinsic operation wrapped by `Vector<>`. I don't use C# so not posting an answer, but it seems perfectly reasonable for a language to work this way. — Peter Cordes, Jul 04 '19 at 23:56
@PeterCordes I guess if you really wanted checked SIMD adds, the steady-state overhead is 2 instructions/add for a large number of adds to *detect* the overflow if the integers are signed (worse if unsigned and < AVX512). But then you'd need a roll-back path to isolate the overflow to the correct place to throw the exception. — Mysticial, Jul 10 '19 at 15:46

score 0 · Accepted Answer · answered Sep 14 '19 at 21:40

Below is an implementation of ulong summation using SSE in C#. I'm posting it, since it's quite a bit shorter and easier to understand than the long summation.

private static decimal SumToDecimalSseFasterInner(this ulong[] arrayToSum, int l, int r)
{
    decimal overallSum = 0;
    var sumVector    = new Vector<ulong>();
    var newSumVector = new Vector<ulong>();
    var zeroVector   = new Vector<ulong>(0);
    int sseIndexEnd = l + ((r - l + 1) / Vector<ulong>.Count) * Vector<ulong>.Count;
    int i;

    for (i = l; i < sseIndexEnd; i += Vector<ulong>.Count)
    {
        var inVector = new Vector<ulong>(arrayToSum, i);
        newSumVector = sumVector + inVector;
        Vector<ulong> gteMask = Vector.GreaterThanOrEqual(newSumVector, sumVector);         // if true then 0xFFFFFFFFFFFFFFFFL else 0L at each element of the Vector<long>
        if (Vector.EqualsAny(gteMask, zeroVector))
        {
            for(int j = 0; j < Vector<ulong>.Count; j++)
            {
                if (gteMask[j] == 0)    // this particular sum overflowed, since sum decreased
                {
                    overallSum += sumVector[j];
                    overallSum += inVector[ j];
                }
            }
        }
        sumVector = Vector.ConditionalSelect(gteMask, newSumVector, zeroVector);
    }
    for (; i <= r; i++)
        overallSum += arrayToSum[i];
    for (i = 0; i < Vector<ulong>.Count; i++)
        overallSum += sumVector[i];
    return overallSum;
}

Both ulong[] and long[] summations using SSE and accumulating to Decimal, to produce a perfectly accurate result have been added to the HPCsharp nuget package that I maintain (open source). The version for long[] is in SumParallel.cs and is called SumToDecimalSseFasterInner().

It's pretty cool to be able to sum long[] or ulong[] arrays using SSE, handling arithmetic overflow in SSE, since the CPU doesn't produce overflow flags for SSE, and do it at SSE speeds, and multi-core!

No overflow exception thrown for long/ulong SSE addition in C#?

1 Answers1