It works fine but looks inefficient to me since many conditions are verified here and each condition branch is a slow operation for modern CPUs.
An optimizing compiler won't branch unnecessarily. Don't worry about premature optimization unless you're in a very tight hot loop. Your first snippet is compiled to only 2 branches in x86-64 and ARM64 in the compilers I tried. It can also be compiled to a branchless version. Of course it may still be slower than a simple A * B <= 0
but there's no way to know that for sure without a proper benchmark
If you don't care about NaN then you can simply do some bitwise operations:
auto a = std::bit_cast<uint64_t>(A);
auto b = std::bit_cast<uint64_t>(B);
const auto sign_mask = 1ULL << 63;
return ((a ^ b) & sign_mask) || (((a | b) & ~sign_mask) == 0);
If A and B have different signs then it'll be matched by (a ^ b) & sign_mask
. If they have same sign then they both must be zero which will be caught by the latter condition. But this works in the integer so it may incur a cross-domain penalty when moving the value from float to int domain
If std::bit_cast
not available then just replace with memcpy(&a, &A, sizeof A)
Demo on Godbolt
Again, do a benchmark to determine what's best for your target. There's no solution that's fastest on every microarchitecture available. If you really run this a lot of times in a loop then you should use SIMD instead to check for multiple values at the same time. You should also use profile-guided optimization in order for the compiler to know where and when to branch