1

For non-SSE code, as answered in the following question (No overflow exception for int in C#?) adding a checked section around addition, throws an overflow exception when adding Int64.MaxValue and 1. However, surrounding SSE addition with checked section does not seem to throw an overflow exception for long[] arrLong = new long[] { 5, 7, 16, Int64.MaxValue, 3, 1 };. I believe most SSE instructions use saturate math, where they reach Int64.MaxValue and don't go past it and never wrap around to negatives. Is there any way in C# to throw an overflow exception for SSE addition, or is it not possible because the CPU may not support raising a an overflow flag?

The code below shows my C# SSE implementation of summation of long[] using SSE. The result is a negative number for the array above, as positives wrap around and do not saturate, as C# must be using that version of the SSE instruction (as there are two versions: one that wraps and one that saturates). Don't know if C# allows developers to choose which version to use. Only the serial code portion of the code below throws an overflow exception, but the SSE portion doesn't.

    using System.Numerics;

    private static long SumSseInner(this long[] arrayToSum, int l, int r)
    {
        var sumVector = new Vector<long>();
        int sseIndexEnd = l + ((r - l + 1) / Vector<long>.Count) * Vector<long>.Count;
        int i;
        for (i = l; i < sseIndexEnd; i += Vector<long>.Count)
        {
            var inVector = new Vector<long>(arrayToSum, i);
            checked
            {
                sumVector += inVector;
            }
        }
        long overallSum = 0;
        for (; i <= r; i++)
        {
            checked
            {
                overallSum += arrayToSum[i];
            }
        }
        for (i = 0; i < Vector<long>.Count; i++)
        {
            checked
            {
                overallSum += sumVector[i];
            }
        }
        return overallSum;
    }
DragonSpit
  • 458
  • 4
  • 9
  • I'm not aware of any C# language feature that specifically supports SSE. You should be more specific in your question about what you're using to implement SSE operations/data types. That said, if it's true that SSE instructions don't overflow but instead always just get pinned to some max or min value (unfortunately, it's been a couple of decades since I did anything with SSE and I recall practically nothing about it), then of course there won't be any way to detect overflow. It's also not clear why, if you're dealing with 64-bit data, you'd expect the array init above to overflow anyway. – Peter Duniho Jul 04 '19 at 19:41
  • @PeterDuniho, see here: https://devblogs.microsoft.com/dotnet/using-net-hardware-intrinsics-api-to-accelerate-machine-learning-scenarios/ or here https://fiigii.com/2019/03/03/Hardware-intrinsic-in-NET-Core-3-0-Introduction/ –  Jul 04 '19 at 19:41
  • @elgonzo: why? I don't see anything in that article that suggests any sort of built-in support in C# for SSE. – Peter Duniho Jul 04 '19 at 19:44
  • As far as i remember, AVX (not SSE, strictly speaking) has operations that are either saturating or overflowing. I don't remember any AVX feature that would treat/enable overflow as an error condition... –  Jul 04 '19 at 19:44
  • @PeterDuniho, i guess you can make that argument for all the countless questions here on SO asking about methods or types of .NET class libraries but talking, referring and tagging specifically C#, as if .NET class library members were having built-in support in C#... –  Jul 04 '19 at 19:46
  • 1
    @elgonzo: no, that's not true. the tag is useful for e.g. when the OP wants an answer expressed specifically in C#, and/or wants to make clear that the context is C#. But this question is _specifically_ about an overflow exception thrown by C# code, which is a C# language feature. But if the C# language itself doesn't have any SSE-specific support, why would we expect it to generate overflow exceptions for SSE code? C# isn't handling any of the SSE operations...those only occur when the data's delivered to the library that itself is supporting it. – Peter Duniho Jul 04 '19 at 19:49
  • @PeterDuniho, ah okay. Got it now what you were trying to say in your 1st comment. Sorry about the misunderstanding... –  Jul 04 '19 at 19:51
  • 1
    @elgonzo: no problem. Frankly, it would help if the OP would clarify his scenario. Given the lack of SSE features in C# itself, obviously he's using _something_ else to access SSE features, and there are multiple possibilities for that. Depending on the specifics, maybe such a "something" _could_ in fact throw an exception for overflow (depending on mode and/or operation). But there's not enough detail here to know whether that's the case or not. – Peter Duniho Jul 04 '19 at 19:53
  • _"C# must be using that version of the SSE instruction"_ -- C# isn't using SSE for C#-define operations at all. You seem to be using a type named `Vector`, which is presumably provided by some SSE-aware library. It's that type you need to investigate, not C#. – Peter Duniho Jul 04 '19 at 22:30
  • Vector is C# support for SSE instructions. It's a light abstraction provided by the System.Numerics standard C# library. It's a great way to abstract variability between CPUs, some supporting SSE, some SSE2, some AVX, and some AVX2. Vector is a nice way to abstract the width of your particular CPUs SSE capabilities and make it generic. – DragonSpit Jul 04 '19 at 22:47
  • 2
    *I believe most SSE instructions use saturate math* No. Most SSE instructions are normal wrapping binary math, for add/sub/multiply/shift with element widths from 1 to 8 bytes. ([`paddb/w/d/q`](https://www.felixcloutier.com/x86/paddb:paddw:paddd:paddq)). There *are* signed and unsigned saturating versions available for add/sub, but only for [8 and 16-bit elements](https://www.felixcloutier.com/x86/paddusb:paddusw)), and saturating pack from 32 to 16, and 16 to 8, but that's all. There's also saturation in the horizontal add in [`PMADDUBSW`](https://www.felixcloutier.com/x86/pmaddubsw). – Peter Cordes Jul 04 '19 at 23:54
  • 2
    Anyway, there's no efficient hardware way to detect signed overflow in most SIMD integer operations. If a language wants checked math, if could of course emulate it with a pcmpeq and branch, but that's more overhead than the scalar equivalent where a language could insert `intO` or `jo` instructions after scalar `add` instructions that set the OF flag on signed overflow. Presumably C#'s `checked` thing only applies to scalar math, not the intrinsic operation wrapped by `Vector<>`. I don't use C# so not posting an answer, but it seems perfectly reasonable for a language to work this way. – Peter Cordes Jul 04 '19 at 23:56
  • 1
    @PeterCordes I guess if you really wanted checked SIMD adds, the steady-state overhead is 2 instructions/add for a large number of adds to *detect* the overflow if the integers are signed (worse if unsigned and < AVX512). But then you'd need a roll-back path to isolate the overflow to the correct place to throw the exception. – Mysticial Jul 10 '19 at 15:46

1 Answers1

0

Below is an implementation of ulong summation using SSE in C#. I'm posting it, since it's quite a bit shorter and easier to understand than the long summation.

private static decimal SumToDecimalSseFasterInner(this ulong[] arrayToSum, int l, int r)
{
    decimal overallSum = 0;
    var sumVector    = new Vector<ulong>();
    var newSumVector = new Vector<ulong>();
    var zeroVector   = new Vector<ulong>(0);
    int sseIndexEnd = l + ((r - l + 1) / Vector<ulong>.Count) * Vector<ulong>.Count;
    int i;

    for (i = l; i < sseIndexEnd; i += Vector<ulong>.Count)
    {
        var inVector = new Vector<ulong>(arrayToSum, i);
        newSumVector = sumVector + inVector;
        Vector<ulong> gteMask = Vector.GreaterThanOrEqual(newSumVector, sumVector);         // if true then 0xFFFFFFFFFFFFFFFFL else 0L at each element of the Vector<long>
        if (Vector.EqualsAny(gteMask, zeroVector))
        {
            for(int j = 0; j < Vector<ulong>.Count; j++)
            {
                if (gteMask[j] == 0)    // this particular sum overflowed, since sum decreased
                {
                    overallSum += sumVector[j];
                    overallSum += inVector[ j];
                }
            }
        }
        sumVector = Vector.ConditionalSelect(gteMask, newSumVector, zeroVector);
    }
    for (; i <= r; i++)
        overallSum += arrayToSum[i];
    for (i = 0; i < Vector<ulong>.Count; i++)
        overallSum += sumVector[i];
    return overallSum;
}

Both ulong[] and long[] summations using SSE and accumulating to Decimal, to produce a perfectly accurate result have been added to the HPCsharp nuget package that I maintain (open source). The version for long[] is in SumParallel.cs and is called SumToDecimalSseFasterInner().

It's pretty cool to be able to sum long[] or ulong[] arrays using SSE, handling arithmetic overflow in SSE, since the CPU doesn't produce overflow flags for SSE, and do it at SSE speeds, and multi-core!

DragonSpit
  • 458
  • 4
  • 9