4

I cannot figure out a way how to get a sum of elements in a vector of System.Numerics.Vector type.

double sum(System.Numerics.Vector<double> vect)
{
     // Something like 
     // double sum = 0;
     // foreach e in vect { sum += e; } 
     // return sum;

     // Vector.method???
     // For loop ???
}

If it's actually possible? How can I do this?

Anton K
  • 4,658
  • 2
  • 47
  • 60
  • 8
    That class is notoriously poorly understood, also the reason it was not added to the .NET Framework. It represents a SIMD cpu register and can store 2 doubles or 4 floats. It should have been named SimdRegister. Did you actually intend to create a vector with an arbitrary number of elements? – Hans Passant Feb 19 '16 at 20:50
  • Have you looked at the `Add()` method? https://msdn.microsoft.com/en-us/library/dn889206(v=vs.111).aspx – stephen.vakil Feb 19 '16 at 20:50
  • 1
    @stephen.vakil It seems like the OP wants to sum the elements of the vector, not add it to another vector. – juharr Feb 19 '16 at 20:52
  • I agree with @HansPassant that you are probably not using the correct data structure. But if you really need to sum the elements you'll have to do a `for` loop because it does not implement `IEnumerable`. – juharr Feb 19 '16 at 20:53
  • @HansPassant Yes, I want to work with long vectors. And I did't know it works just with 2 doubles. – Anton K Feb 19 '16 at 21:21
  • You'll have to use plain olde `List`. System.Numerics.Vector is only interesting for the type of data that can be accelerated with SIMD instructions. Like a pair of doubles or 4 floats/ints or 8 shorts or 16 bytes. The value of `Vector.Count`. A static variable. – Hans Passant Feb 19 '16 at 21:43
  • @HansPassant I already found register field inside. You're right. I need to re-implement my logic of sum. The name is indeed misleading. – Anton K Feb 19 '16 at 21:46
  • Just curious why did you accept no answer? – aepot Aug 25 '21 at 18:46
  • @aepot I moved to another project and cannot test the answer is correct. – Anton K Aug 26 '21 at 00:48
  • 1
    I tested it, it works like a charm. Also JIT produces well-optimized intrinsics for it. – aepot Aug 26 '21 at 05:49

2 Answers2

13

Assuming you did intend to have a Vector that could contain (in today's hardware) either 2 or 4 doubles, this will sum them.

double vectorSum = Vector.Dot(yourDoubleVector, Vector<double>.One);

The Dot method calculates the dot product of the two vectors, which is defined for two vectors A and B of size n as A1 * B1 + A2 * B2 + ... + An * Bn

So the dot product of a vector A and another vector of all 1's would be just the sum of the items in vector A.

saucecontrol
  • 1,446
  • 15
  • 17
  • Unfortunately the JIT is fairly dumb and does an actual multiply, before shuffle/add of the results. ([Writing a vector sum function with SIMD (System.Numerics) and making it faster than a for loop](https://stackoverflow.com/a/67629019) shows the asm: vmulpd / `vhaddpd ymm0, ymm0, ymm0` / `vextractf128 xmm1, ymm0, 1` / `vaddpd xmm0, xmm0, xmm1`). – Peter Cordes May 21 '21 at 19:14
  • I don't know if there's a way to convince the JIT to do the same shuffles but without the multiply with System.Numerics (without Intrinsics .GetUpper() / Sse2.Add); I'd expect that just looping over the elements would result in store/reload which might be worse, especially for 256-bit vectors of float (8 elements instead of just 4). – Peter Cordes May 21 '21 at 19:14
  • [c# multiplying array elements using system.numerics](https://stackoverflow.com/q/60136108) says it's not possible with just System.Numerics. :/ (Unless there's special support for add reductions; that one's asking for reducing with multiply.) – Peter Cordes May 21 '21 at 19:18
  • Yeah, JIT is limited here. Current JIT will at least auto-unroll a loop over `Vector.Count`, but it still emits either a load or `vextractf128`/`vpsrldq` pair to read the upper elements. Still cheaper to do the unnecessary multiply, unfortunately. – saucecontrol May 22 '21 at 20:19
  • Current codegen with loop: [sharplab](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABABgAJiBGAOgDkBXfGKASzFwG4BYAKDMq0ASgwB2GVsxoBhCPgAOrADYsAyiwBu7GFz59iAZkoAmctPIBvPuRvkA2gFkYGABYQAJgEkFSgBRPXD295JQB5eQkIUVwaAEEAc3jYXFxWDRhPUSVWURz4gEoAXVtya1tDQSRydwgGYBVyVSZfADUYMAxoAB4aupUAPnJ0jvyymyteEpLe+phyZPIAXnJSHkmpmwAzaHJfHIxyViWVzkPyLvI2ju6ZgZla8VPWAGpn0fWNkoXn5eGMO1YhTWJTGU2IAHZ5jpgbYAL6g0EVahVW5zBy+d4lCafIbYKBDdoHZZXTpQHq1Wb9GihUQwGEbCGNZp/fL08jw3iwoA=) – saucecontrol May 22 '21 at 20:21
  • Wow that is so bad! `vaddsd` is efficient with a memory operand, but Sum uses `vmovsd` into a separate register each time. (And strict FP semantics mean it has to actually do the `0 + vect[0]` first iteration :/). The version inlined into M() is even more insane, e.g. `vextractf128` twice, instead of just getting the low and high double out of the high half. IDK why it picks a different strategy there; both cases have the vector in memory (because of the crappy calling convention, or because it's a constant, not a result of a math operation.) – Peter Cordes May 22 '21 at 20:52
  • But anyway, unless there's a fast-math option in C#, the compiler doesn't have the option of doing a clean hsum with vextractf128 / vaddpd / vunpckhpd / vaddsd. And yes, vunpckhpd or `vmovhlps` are at least as good as `vpsrldq` by 8, being an FP shuffle and having shorter machine-code (no immediate). (See the C / asm Q&A [Fastest way to do horizontal SSE vector sum (or other reduction)](https://stackoverflow.com/a/35270026)) – Peter Cordes May 22 '21 at 20:54
  • 1
    Yeah, problems like this are why System.Runtime.Intrinsics was added with direct ISA mappings. In .NET 5+, there's a no-op conversion from `System.Numerics.Vector` to `System.Runtime.Intrinsics.Vector256` (or `Vector128` as appropriate). With that, the efficient version can be implemented manually. – saucecontrol May 22 '21 at 21:54
  • `Sse3.MoveHighAndDuplicate` duplicates 32-bit floats within each 64-bit half of an XMM; that's why C# made you use `.AsSingle` to shoot yourself in the foot that way. You want `Sse2.UnpackHigh(v128, v128)` (`unpckhpd`), or `Sse.MoveHighToLow(v128.AsSingle(), v128.AsSingle()).AsDouble()`. `movhlps` saves a byte of code-size in legacy-SSE encoding, but not with VEX. – Peter Cordes May 22 '21 at 22:08
  • But yeah, you might want to add an intrinsics part to your answer, so people can do it more efficiently if they're willing to use them. I guess you'd have to hope that the vector size check was compile-time-const. Of course that would make it not portable to ARM/ARM64, unless you can also make that conditional on some compile-time platform constant, so you get hand-tuned intrinsics on x86 (and ARM if you write code for it), and the .Dot() fallback on other platforms. – Peter Cordes May 22 '21 at 22:11
  • Ha, I never work in doubles, so I missed that `movhldup` mistake. Will fix for posterity. This answer pre-dates the existence any of the S.R.I stuff and the question specifically asks about .NET 4.6 (legacy framework), which will never have support for S.R.I., so I think it's a different audience. – saucecontrol May 22 '21 at 22:16
  • corrected [sharplab](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABABgAJiBGAOgDkBXfGKASzFwG4BYAKDMq0ASgwB2GVsxoBJcW1G52XPgOo0R4yTBlzWCpTQAaADiQ9eKgMyUATOQDC5AN59ybytepJyAEwgNgABsYcgBlJgB1VgwAC1kMeUUOAAoASld3F153HPIAN2wofJgwDHIAXnIANRKMaAAePwDggD4aAHlRGHNc91YAM3JkmtKGpqCYNvt/cQrKlHTs3ucM5fcCorybAFZvSrzamgBBXBG6qB2kNJ619cL8qhtjCrDcGBtjnx9krd2aAHEYBgADIQADuLDSaHylwBQIAqgAHRGQ1KpG63B5PF6hN4fI5fUJgbCBQo/R7GaG4940eGiRHYMAAawAEqwAOYxclPaF5ClojG3Va3YgAdixxhoABUIESSWT0cKcgBfJXuGCBN5qtxZTFuAD0+vIdBgByKMWwyJgCnIEFE5COVUMdgtUB8YMKMG1uTFvhg/WwDECGEFKuFqt4yqAA===) – saucecontrol May 22 '21 at 22:16
1

From .NET 6 onward, this is exposed directly via Vector.Sum<T>(Vector<T> value):

double sum(System.Numerics.Vector<double> vect) // note can be 'static'
    => Vector.Sum(vect);
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Is this for an arbitrary number of elements in a vector or a register size? – Anton K Jan 04 '23 at 21:05
  • `Vector` is the machine's SIMD size - commonly 256-bit for AVX2; `Vector.Count` (a `static` property) tells you the number of elements of `T` in a vector – Marc Gravell Jan 04 '23 at 21:46