2

My question is somehow related to an answer to this topic.

Consider the following C program:

#include <emmintrin.h>
#include <stdio.h>

void print(__m128* v)
{
    union Helper
    {
        __m128 m128;
        __attribute__((aligned(16))) float f[4];
    };

    Helper h;
    h.m128 = *v;

    printf("%f %f %f %f\n", h.f[0], h.f[1], h.f[2], h.f[3]);
}

int main()
{
    __m128 a  = _mm_set1_ps(0.0f / 0.0f);
    __m128 b  = _mm_set1_ps(0.0f);
    __m128 m1 = _mm_max_ps(a, b);
    __m128 m2 = _mm_max_ps(b, a);

    print(&m1);
    print(&m2);
}

which prints

0.000000 0.000000 0.000000 0.000000
nan nan nan nan

I compiled this with Mac OS X Clang from Xcode but observed similar behavior with GCC on Linux. Has anyone an explanation for this? Can I in general rely on this behavior (_mm_max_ps returning the value of its 2nd argument if one of its arguments is NaN or -NaN)? This document here states that if a single operand to MAXPS is NaN, the source operand is returned. While this seems somehow orthogonal to the behavior I observed, the intrinsic (as opposed to the actual SSE instruction) is a binary operation w/o side effects and one cannot speak of source operands or such in this case.

Community
  • 1
  • 1
szellmann
  • 21
  • 1
  • 1
    I don't understand the problem. From the link you provided it says " If either argument is a NaN, MINPS and MAXPS simply return the second argument". And that's exactly what you observe. – Z boson May 04 '16 at 07:33
  • Note that if you compile with optimization enabled `-O3` GCC will eliminate the second call to max ps (`_mm_max_ps(b, a)` and you will only get zeros. – Z boson May 04 '16 at 07:34
  • 1
    Your second link does not say it say "that if a single operand to MAXPS is NaN, the source operand is returned" it says " If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MAXPS can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR". i.e. if you want to return Nan when either the first or second operand is NaN then you can't rely on MAXPS alone. You have to use additional instructions. – Z boson May 04 '16 at 08:06
  • Incidentally `std::max` does almost the same thing. But it returns the first operand rather than the second if the first operand is NaN. – Z boson May 04 '16 at 08:13
  • The link says that "If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result.". With the intrinsic, however, neither a or b (if I'm getting it right) can be thought of as a source operand. – szellmann May 04 '16 at 08:22
  • Why can't a or b be considered source operands with intrinsics? Are you arguing the your compiler is free to swap operands if it wants to with an intrinsics. I guess GCC sorta did this with optimization as it ignored the second call because it assumed they commute and therefore equal. – Z boson May 04 '16 at 08:24
  • Of course my example program is prone to optimization. I have a C++ template which calculates a dot product for shading and clamps it w/ zero (max(T(dot(...)), T(0.0)), where max() unrolls (essentially) to std::max or _mm_max_ps and T is either float or __m128. The dot product can in some cases be 0 (e.g. with zero-length gradients) I encountered some weird rendering artifacts with this and now wonder if the right fix is to check for the gradient being zero, or simply swapping the order of the operands. For the latter to be a robust solution, I rely on the specification of the intrinsic. – szellmann May 04 '16 at 08:26
  • Where did you read that intrinsics with two operands must be commutative? – Z boson May 04 '16 at 08:26
  • Well I don't think you can rely on the intrinsic in this case. You have to use a few more intrinsics to make sure that the result does not depend on the order. You would have to do [the same thing](http://stackoverflow.com/questions/1632145/use-of-min-and-max-functions-in-c/30915238#30915238) with `std::max`. – Z boson May 04 '16 at 08:29
  • I thought that with a construct such as a = max(a, b), a is the dst operand, and b is the src operand. But the semantic of the intrinsic isn't that the result of the max() operation is not assigned to a, but is returned to the calling function. So (from my understanding) neither a nor b can be thought of as the 'source' operand, so that I'm not sure that the docs for MAXPS apply to the intrinsic in this regards. – szellmann May 04 '16 at 08:30
  • I'm not sure and haven't read anywhere that the intrinsic is commutative. – szellmann May 04 '16 at 08:31
  • Maybe not a perfect duplicate, but my answer there goes into all the details about intrinsic vs. instruction and commutativity. Including a long-standing gcc bug where it treated the intrinsic as commutative even without `-ffast-math`. – Peter Cordes Oct 11 '17 at 02:43

0 Answers0