6

AVX allow for bitwise logical operations such as and/or on floating point data-type __m256 and __m256d.

However, C++ doesn't allow for bitwise operations on floats and doubles, reasonably. If I'm right, there's no guarantee on the internal representation of floats, whether the compiler will use IEEE754 or not, hence a programmer can't be sure about how the bits of a float will look like.

Consider this example:

#include <immintrin.h>
#include <iostream>
#include <limits>
#include <cassert>

int main() {

    float x[8] = {1,2,3,4,5,6,7,8};
    float mask[8] = {-1,0,0,-1,0,-1,0,0};
    float x_masked[8];

    assert(std::numeric_limits<float>::is_iec559);

    __m256 x_ = _mm256_load_ps(x);
    __m256 mask_ = _mm256_load_ps(mask);

    __m256 x_masked_ = _mm256_and_ps(x_,mask_);

    _mm256_store_ps(x_masked,x_masked_);

    for(int i = 0; i < 8; i++)
        std::cout << x_masked[i] << " ";

    return 0;
}

Assuming that IEEE754 is used, as the representation of -1 is 0xffffffff, I would expect the output to be

1,0,0,4,0,6,0,0

while it's instead

1 0 0 1.17549e-38 0 1.17549e-38 0 0

Hence my assumption about the internal representation was probably wrong (or I made some silly mistake).

So the question is: is there a way in which I can use floating point logical and be safe about the fact that the result will make sense?

Fabio
  • 644
  • 1
  • 8
  • 17
  • 7
    in IEEE754, -1 is not 0xffffffff, it's 0xbf800000. – genisage Jul 24 '14 at 20:56
  • 2
    @genisage, when you do a comparison with e.g. `_mm256_cmp_ps(x, y, 1)` it returns -1 = 0xffffffff and not 0xbf800000. Floating point AVX bitwise operators act just like integer AVX operators except they operate in the floating point execution uni instead of the integer one. – Z boson Jul 25 '14 at 07:47
  • It is possible to do bitwise operators on floats in C++. See my answer. It's a fairly good assumption that all floating point operations in x86-64 code will use IEEE754. – Z boson Jul 25 '14 at 08:43
  • When you do a comparison with `_mm256_cmp_ps`, the return type is `__m256` and not `__m256i`. False is 0x00000000 and True is 0xFFFFFFFF, so depending on whether you are treating a true result as integer or float, the answer is either -1 or 1.7549e-38. – Mark Lakata Jul 21 '16 at 22:06
  • Possible duplicate of [Bitwise operation on a floating point usefulness](https://stackoverflow.com/questions/29195566/bitwise-operation-on-a-floating-point-usefulness) – phuclv Sep 18 '17 at 06:07

3 Answers3

12

If you're using AVX intrinsics, then you know you're using IEEE754 floats, because that's what AVX does.

Some of the bitwise operations on floats that make sense are

  • selecting, as in Jens' answer, though as of SSE4.1 we have blendvps and its relatives to do that in one instruction
  • absolute value (mask away the sign)
  • negate (xor with -0.0f)
  • transfer sign
  • extracting the exponent (rare)

Mostly it's for manipulating the sign, or to selectively zero out whole floats, not so much for mucking about with individual bits of the exponent or significand - you can do it, but it's rarely useful.

harold
  • 61,398
  • 6
  • 86
  • 164
  • Nice answer. So far I have gained very little in using avx compared to SSE2, but the blendvps operation is certainly a reason to use avx. – Jens Munk Jul 24 '14 at 21:42
  • You could do all of these operations on floating point values using AVX2 integer operations. The question is why is there a need for `_mm256_and_ps` when you could have used `mm256_and_si256`. – Z boson Jul 25 '14 at 07:40
  • 1
    @Zboson maybe. That's not how the question reads to me, though. The title looked like it was going to ask that, but then the body of the question seemed more concerned with how bitwise operations on floats make sense in the first place. – harold Jul 25 '14 at 08:09
  • I reread the question and I see your point. If the OP just wants to know why bitwise operations make sense on SIMD floats then the answer is the same as asking why it's useful for scalars. That's far less interesting. – Z boson Jul 25 '14 at 12:50
  • The main reason for my question was that I was not sure about the fact that AVX uses IEEE754. Another reason for bitwise operation on floating point is the instruction _mm256_mask_i32gather_ps that require a floating point mask. I find a mask easier to create using bitwise operations (of course is always just matter of changing sign bit) – Fabio Jul 29 '14 at 08:27
6

The reason is that there may be penalities for switching between domains of execution units bypass-delays-when-switching-execution-unit-domains and why-do-some-sse-mov-instructions-specify-that-they-move-floating-point-values. In this case switching from a floating point AVX execution unit to an integer execution AVX unit.

For example let's say you want to compare to floating point AVX registers x and y

z = _mm256_cmp_ps(x, y, 1);

The AVX register z contains boolean integer values (0 or -1) which you can then logical AND using _mm256_and_ps or with _mm256_and_si256 if you wanted. But _mm256_and_ps stays in the same execution unit and _mm256_and_si256 switches units which may cause a bypass delay.

Edit: in regards to bitwise operators on floats in C++ it certainly is possible and is sometimes useful. Here are some simple examples.

union {
    float f;
    int i;
} u;
u.i ^= 0x80000000; // flip sign bit of u.f
u.i &= 0x7FFFFFFF; // set sign bit to zero //take absolute value
Z boson
  • 32,619
  • 11
  • 123
  • 226
3

The programmer can be perfectly sure of how single precision floating point are represented. How functions are implemented is another story. I have made use of the bitwise operations for implementing half-precision floats conforming to IEEE-754. I have also made use of the operations for branch-removal back in 2003 - before IBM filed a patent for this.

static inline __m128 _mm_sel_ps(__m128 a, __m128 b, __m128 mask ) {
    b = _mm_and_ps( b, mask );
    a = _mm_andnot_ps( mask, a );
    return _mm_or_ps( a, b );
}

This example demonstrates how to remove a floating point branch using SSE2. The same can be achieved using AVX. If you try (the same technique) to remove branches using scalars you will not gain any performance due to the switch of context (applies to x86 - doesn't apply to ARM, where you have the fpsel operation)

Jens Munk
  • 4,627
  • 1
  • 25
  • 40
  • IBM tried to patent blend based on a mask with and/andnot/or? Lol. Compilers like gcc and clang often use this pattern when auto-vectorizing a ternary without SSE4.1 `blendvps`, so if IBM's patent was ever relevant, it's not now. – Peter Cordes May 26 '20 at 19:59