9

I am having a really weird bug with Intel Intrinsics on an AVX2 function, which I would want to share here. Either it is me doing something wrong (I cannot really see what at this point), or a bug in the library.

I have this simple code inside my main.c:

__int64 test = 0xFFFF'FFFF'FFFF'FFFF;
__m256i ymm = _mm256_set_epi64x(0x0000'0000'0000'0000,
                                0x0000'0000'0000'0000, 
                                0x0000'0000'0000'0000, 
                                test);

The value that gets assigned to variable ymm is for some strange reason:

ymm.m256i_i64[0] = 0xffff'ffff'ffff'ffff
ymm.m256i_i64[1] = 0x0000'0000'0000'0000
ymm.m256i_i64[2] = 0x0000'ffff'0000'0000
ymm.m256i_i64[3] = 0x0000'0000'0000'0000

I have been debugging for hours at this point, but cannot see why ymm.m256i_i64[2] gets this rogue value. Please help!

Fun/weird fact: If I write this C-code:

__m256i ymm = _mm256_set_epi64x(0x0000'0000'0000'0000,
                                0x0000'0000'0000'0000, 
                                0x0000'0000'0000'0000, 
                                0xFFFF'FFFF'FFFF'FFFF);

Then the values get correctly set to:

ymm.m256i_i64[0] = 0xffff'ffff'ffff'ffff
ymm.m256i_i64[1] = 0x0000'0000'0000'0000
ymm.m256i_i64[2] = 0x0000'0000'0000'0000
ymm.m256i_i64[3] = 0x0000'0000'0000'0000

Note: I am using Visual Studio; both their compiler and their debugging tools, as below example picture shows: enter image description here

The printf following the code printed: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ff ff ff 00 ff ff 00 00 ff 00 00 00 ff 00 00 00.

It seems that the rogue changes in the other variables in the struct can change, since they are not the same after I added the loop, as they were before... (I don't know if the loop specifically made the change).

Edit: I am no hawk to assembly.... Not at all. I added the generated assembly-code though in the picture below, in case that can help anyone to help me understand what's going on, and if it is a bug not caused by me: enter image description here

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
oPolo
  • 516
  • 3
  • 14
  • 1
    "Bug in library" is rather unlikely for an intrinsic which is just a thin wrapper. How do you check the result? (Bug could be in there). What's the generated assembly? – MSalters May 29 '16 at 11:13
  • Yes indeed. I add a breakpoint just after the assignment and check the value given to the __m256i struct, which appears wrong. I have added a picture to the original post for clarity in a sec. – oPolo May 29 '16 at 11:18
  • 1
    Sure looks weird. I wouldn't immediately rule out a bug in the visualizer; that's _far_ more complex than the intrinsic to assembly mapping. – MSalters May 29 '16 at 11:23
  • You screen shot shows `long long` your question uses `__int64`. – alk May 29 '16 at 11:24
  • My bad, I tested different data-types to see, if it yielded correct results. Will update screenshot to use __int64, as that is what the Intel Intrinsics documentation uses ( https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=set_epi64x&expand=4568 ) – oPolo May 29 '16 at 11:31
  • I added the generated assembly code as a picture, but I must admit... I know nearly nothing of assembly, which is why I am using Intel Intrinsics to access AVX functions. I hope the assembly-code can help clarify the mystery somehow. – oPolo May 29 '16 at 11:40
  • 6
    Clearly you are using VS2015. It looks like a regression of [this VS2013 bug](https://connect.microsoft.com/VisualStudio/feedback/details/812347/visual-c-17-18-sse-avx-bugs). Or they just didn't address it for the 32-bit code generator which is likely because the intrinsic couldn't be used in x86 with VS2013. It works fine when you target x64, which is the best workaround. You can file a bug report at the same place. – Hans Passant May 29 '16 at 12:26
  • 1
    @oPolo: doesn't VS let you copy/paste the disassembly text? Avoid posting pictures of text whenever possible. Don't bother changing at this point, since Hans's comment is probably the answer. – Peter Cordes May 30 '16 at 02:52
  • 1
    Also, please tell me that's "debug mode" asm output. Using `pshufb` (`_mm_shuffle_epi8`) with a constant from memory for that shuffle is just completely braindead. – Peter Cordes May 30 '16 at 03:00
  • If MSVC defaulted to 64-bit mode there would be a lot fewer of these question on SO. – Z boson May 30 '16 at 15:28
  • https://stackoverflow.com/questions/27258261/unresolved-external-symbol-mm256-setr-epi64x/27267287#27267287 – Z boson Jun 08 '16 at 07:25

1 Answers1

5

MSVC until recently did not support any of the epi64x intrinsics in 32-bit mode. In Agner Fog's VCL library he writes

//#if defined (_MSC_VER) && _MSC_VER < 1900 && ! defined (__x86_64__) && ! defined(__INTEL_COMPILER)
// MS compiler cannot use _mm256_set1_epi64x in 32 bit mode, and  
// cannot put 64-bit values into xmm register without using
// mmx registers, and it makes no emms

To work around this in 32-bit mode with MSVC you can do this:

union {
    int64_t q[4];
    int32_t r[8];
} u;
u.q[0] = a; u.q[1] = b; u.q[2] = c; u.q[3] = d;
_mm256_setr_epi32(u.r[0], u.r[1], u.r[2], u.r[3], u.r[4], u.r[5], u.r[6], u.r[7]);

Or use 64-bit mode.

Z boson
  • 32,619
  • 11
  • 123
  • 226
  • 2
    Late answer, sorry... It was some stressful months. I was writing my thesis, which fortunately ended up well. Regardless of the lateness of my answer, your post helped me spot what I thought was a bug that could have costs me hours (if not a day or two) of troubleshooting during my thesis. That's worthy of a big thanks: Thanks a lot for your help!! – oPolo Nov 08 '16 at 15:40