4

Microsoft released a Vector library that is supposed to allow for SIMD instructions in .NET (see https://instil.co/2016/03/21/parallelism-on-a-single-core-simd-with-c/).

However, this thing seems to crash with the most basic call to their constructor:

System.Numerics.Vector<byte> vec
    = new System.Numerics.Vector<byte>(new byte[] { 4, 3, 2, 1, 6, 4, 2, 4 });

This throws a System.IndexOutOfRangeException: 'Index was outside the bounds of the array.' What am I missing here? This works fine if I replace all occurences of byte with int.

Zachary Canann
  • 1,131
  • 2
  • 13
  • 23
  • 1
    I don't know anything about this C# library, but x86 SSE2 vectors are 16 bytes wide. Perhaps a `Vector` is expecting at least 16 elements to fill an XMM register. – Peter Cordes Nov 19 '17 at 04:32
  • 1
    Good idea -- it didn't seem to work though. I assumed that this was user error, perhaps there is just a bug in this library. – Zachary Canann Nov 19 '17 at 04:40
  • Their code examples show the constructor taking an array and an index: `var va = new Vector(lhs, i);` IMO read the article more carefully and figure out what that means. I'd guess it's (logically at least) doing a SIMD vector load from that position in the array, getting 16 or 32 bytes (or 64 with AVX512). Anyway, maybe see if you can make their example code work. – Peter Cordes Nov 19 '17 at 04:45
  • After poking at it a bit more, it seems like it doesn't crash if there are >= 32 bytes. This seems pretty silly to me, because after this they do not enforce alignment. I can pass in 33, 34, etc. – Zachary Canann Nov 19 '17 at 04:47
  • 1
    Ah, you're probably on a CPU with AVX2, so it's using SIMD vectors that are 32B wide. (YMM registers). It's not an alignment requirement, just a minimum size. There might be a way to explicitly use XMM vector width for cases where you have less than 32B left to process (e.g. in the cleanup after a loop, if that works better than an unaligned last-32-bytes that may overlap with an earlier vector...) – Peter Cordes Nov 19 '17 at 04:48
  • 1
    That makes sense. I'll file an issue against them regardless. I feel like the preferred behavior would be automatic padding, or a more clear exception. (https://github.com/dotnet/corefx/issues/25344) – Zachary Canann Nov 19 '17 at 05:04
  • Automatic padding doesn't make much sense because there's no fully efficient way to store only the first N elements of a SIMD register back to memory. For 32-bit element size and larger, there are moderately efficient masked-store instructions, but generating a mask from an integer count usually requires a load from a lookup table, and having the class do all this for you under the hood just to support non-full vectors seems a bit much. (i.e. probably hard for the compiler to optimize away if the object has an elements-used member). – Peter Cordes Nov 19 '17 at 05:18
  • See https://stackoverflow.com/questions/34306933/vectorizing-with-unaligned-buffers-using-vmaskmovps-generating-a-mask-from-a-m for more about handling partial vectors with `vpmaskmovd` / `vmaskmovps`, and why that's not usually the best idea in the first place. Also, just padding with zeros is only efficient for 4, 8, or 16 bytes. (`vmovd`, `vmovq`, or `vmovdqu`). Padding with whatever garbage comes next in memory is possible, but not if it crosses into an unmapped page and triggers a fault. (Also, managed code is unlikely to be allowed to read outside of objects...) – Peter Cordes Nov 19 '17 at 05:18
  • It sounds to me like you should treat this constructor as a SIMD load of whatever the vector width of the target CPU is. That does kinda suck if your code suddenly breaks on new hardware because vectors are wider, though, so there must be (or should be) some kind of way around it. – Peter Cordes Nov 19 '17 at 05:22
  • Have a very similar strange situation, the Vector library : "4.5.0" , running on .Net 4.72 inside VMWare VM. Getting same exception on following line: var calcVector = new Vector(_stackFrames); The "problem" is that "_stackFrames" defined as private readonly ulong[] _stackFrames; and in this specific case has size of 2 (ulongs , not bytes). In case the array is larger - I do not get this issues. – TheLordkiron Jan 22 '23 at 16:02

0 Answers0