I'm testing what sort of speedup I can get from using SIMD instructions with RyuJIT and I'm seeing some disassembly instructions that I don't expect. I'm basing the code on this blog post from the RyuJIT team's Kevin Frei, and a related post here. Here's the function:
static void AddPointwiseSimd(float[] a, float[] b) {
int simdLength = Vector<float>.Count;
int i = 0;
for (i = 0; i < a.Length - simdLength; i += simdLength) {
Vector<float> va = new Vector<float>(a, i);
Vector<float> vb = new Vector<float>(b, i);
va += vb;
va.CopyTo(a, i);
}
}
The section of disassembly I'm querying copies the array values into the Vector<float>
. Most of the disassembly is similar to that in Kevin and Sasha's posts, but I've highlighted some extra instructions (along with my confused annotations) that don't appear in their disassemblies:
;// Vector<float> va = new Vector<float>(a, i);
cmp eax,r8d ; <-- Unexpected - Compare a.Length to i?
jae 00007FFB17DB6D5F ; <-- Unexpected - Jump to range check failure
lea r10d,[rax+3]
cmp r10d,r8d
jae 00007FFB17DB6D5F
mov r11,rcx ; <-- Unexpected - Extra register copy?
movups xmm0,xmmword ptr [r11+rax*4+10h ]
;// Vector<float> vb = new Vector<float>(b, i);
cmp eax,r9d ; <-- Unexpected - Compare b.Length to i?
jae 00007FFB17DB6D5F ; <-- Unexpected - Jump to range check failure
cmp r10d,r9d
jae 00007FFB17DB6D5F
movups xmm1,xmmword ptr [rdx+rax*4+10h]
Note the loop range check is as expected:
;// for (i = 0; i < a.Length - simdLength; i += simdLength) {
add eax,4
cmp r9d,eax
jg loop
so I don't know why there are extra comparisons to eax
. Can anyone explain why I'm seeing these extra instructions and if it's possible to get rid of them.
In case it's related to the project settings I've got a very similar project that shows the same issue here on github (see FloatSimdProcessor.HwAcceleratedSumInPlace()
or UShortSimdProcessor.HwAcceleratedSumInPlaceUnchecked()
).