Copying -nan for float and AVX __m256 shows 0 after copying

Question

I have the two following situations, which to me should be similar, but apparently are not.

This one does not work, while this would be more ideal:

static union { __m256 trueMask8; float trueMask[8]; };
void Init()
{
   trueMask8 = _mm256_cmp_ps(_mm256_setzero_ps(), _mm256_setzero_ps(), _CMP_EQ_OS);
}

class Ray8
{
   union { __m256 activeMask8; float deadMask[8]; };

   Ray8()
   {
      activeMask8 = trueMask8;
      int w = 0; //breakpoint
   }
}

The problem with the one above is that activeMask8 shows all 0 on the breakpoint, while trueMask8 shows all -nan, so I'm sure the Init has been called.

The one below works, but its less ideal since every time I need a true mask I need to call the compare:

class Ray8
{
   union { __m256 activeMask8; float deadMask[8]; };

   Ray8()
   {
      activeMask8 = _mm256_cmp_ps(_mm256_setzero_ps(), _mm256_setzero_ps(), _CMP_EQ_OS);
       int w = 0; //breakpoint
   }
}

Here everything in activeMask8 is -1 at the breakpoint.

The same goes for:

activeMask[0] = trueMask[0];

Which are two floats. trueMask[0] shows -nan, while activeMask[0] shows 0 afterwards.

Why does this occur? I would normally try to make a copy constructor, but _m256 is a library type. Is there a solution to this?

I haven't been able to reproduce the error. Can you please provide a complete working example and the command you use to reproduce this. — jackw11111, Jan 26 '20 at 01:27
@jackw11111, that's weird. I have just tested it in a console app too, and there it works indeed. I am going to try and narrow it down further. Thanks for pointing this out — Amber Elferink, Jan 26 '20 at 01:44
What's wrong with `_mm256_castsi256_ps(_mm256_set1_epi32(-1))`? Let the compiler figure out how to get that bit pattern depending on whether AVX2 or just AVX1 is available. It *should* compile fairly efficiently. And don't make it load from a non-const `static` union; that's unlikely to be better. — Peter Cordes, Jan 26 '20 at 02:46
@Peter, it won't let me make true mask global unless I set static. The issue is that I need to set a single value in the mask as well, as in the last example. I figured a true mask would be a good way to do that. — Amber Elferink, Jan 26 '20 at 11:37
Wait, so `trueMask8` in your static anonymous union isn't just a constant? You *want* to be able to change it? My point was that you normally don't want to use a global/static `__m256` to hold the result of a `_mm256` intrinsic initializer; it can defeat constant-propagation even if you do make it `const`. So for example you'd do https://godbolt.org/z/U7_Qh_ and initialize the union with an `int` array of `-1` elements so it can be *statically* initialized. That would in theory let compilers do constant propagation in cases where you use it. You can still do that for non-const. — Peter Cordes, Jan 26 '20 at 15:58
Be aware that in C++ accessing `deadMask` after setting `activeMask8` is undefined (or unspecified?) behavior -- unions should not be used for type-punning in C++: https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior — chtz, Jan 27 '20 at 09:37

score 0 · Answer 1 · answered Jan 26 '20 at 13:55

So somehow, it has to do with the scope of setting the trueMask8 to true. I to this moment cannot reproduce it in a console program, but I have found the following:

I keep trueMask global in my ray.h file in both cases.

If I set trueMask within my Ray8() constructor before copying it to activeMask8, it works, and continues to do so in the rest of my program.

If I set trueMask in an Init or constructor that is not the Ray8() constructor, it does not work. I cannot reproduce this in a console program, so I still have no clue what causes it. But for now, it is a solution to just execute it for every Ray8 I make.

If `Init()` is run from a constructor other than Ray8, perhaps you're running into the fact that order of construction for static objects is not defined. (aka the static initialization order fiasco.) The easiest solution would be using a constant initializer for the union, like https://godbolt.org/z/U7_Qh_, so it can go in `.rodata` or `.data`, instead of only runtime-initialized. — Peter Cordes, Jan 26 '20 at 16:01

Copying -nan for float and AVX __m256 shows 0 after copying

1 Answers1