17

Consider this simple code:

#include <complex.h>
complex float f(complex float x) {
  return x*x;
}

If you compile it with -O3 -march=core-avx2 -fp-model strict using the Intel Compiler you get:

f:
        vmovsldup xmm1, xmm0                                    #3.12
        vmovshdup xmm2, xmm0                                    #3.12
        vshufps   xmm3, xmm0, xmm0, 177                         #3.12
        vmulps    xmm4, xmm1, xmm0                              #3.12
        vmulps    xmm5, xmm2, xmm3                              #3.12
        vaddsubps xmm0, xmm4, xmm5                              #3.12
        ret 

This is much simpler code than you get from both gcc and clang and also much simpler than the code you will find online for multiplying complex numbers. It doesn't, for example appear explicitly to deal with complex NaN or infinities.

Does this assembly meet the specs for C99 complex multiplication?

Simd
  • 19,447
  • 42
  • 136
  • 271
  • What version of the Intel compiler are you using ? – Paul R Feb 05 '17 at 10:21
  • @PaulR 17 via godbolt. – Simd Feb 05 '17 at 10:28
  • Here is the result from godbolt for ICC 17 https://godbolt.org/g/tLXyqj – Z boson Feb 06 '17 at 09:51
  • 1
    @Zboson That's the code in the question. – Simd Feb 06 '17 at 09:54
  • GCC calls `__mulsc3` https://godbolt.org/g/9rcINV . How do I get GCC to inline the call to __mulsc3? With `-Ofast` GCC inlines it though https://godbolt.org/g/7vq9su – Z boson Feb 06 '17 at 09:54
  • 1
    @eleanora, I know it's the code in the question. I added the link in case somebody wants to play with it. – Z boson Feb 06 '17 at 09:55
  • @Zboson I don't think your second link is GCC inlining __mulsc3. It is a completely different piece of code that ignores the niceties of the C99 specification when you enable -ffast-math. GCC can't inline it sadly as __mulsc3 comes from libc so it would have to be disassembled first. – Simd Feb 06 '17 at 09:57
  • I mean that it inlined the complex square under the assumptions of `-ffast-math`. – Z boson Feb 06 '17 at 10:00
  • In any case, it's interesting that ICC gets this wrong even with `strict`. That's why I added the godbolt link. I tried other options but it seems ICC is just wrong (or at least non-conforming). I guess it's a bug. – Z boson Feb 06 '17 at 10:01
  • @Zboson It is interesting if this is a bug. It seems complex number support is just weak in all the main compilers. Look at this weird behavior of ICC when dividing two numbers https://godbolt.org/g/qO1isr. – Simd Feb 06 '17 at 10:10
  • 1
    Well if your goal is efficiency [don't use `complex`](http://stackoverflow.com/a/41440508/2542702). It's stupid type IMHO that was added to C because Fortran has a complex type. One reason I like C is that the types are primitive and often map directly to registers in assembly. But `complex` is a composite type like a class in C++. It just seems odd to be in C. I suppose there is hardware the directly implements complex types but I have never used any that I know of. – Z boson Feb 06 '17 at 10:16
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/134955/discussion-between-eleanora-and-z-boson). – Simd Feb 06 '17 at 10:18
  • @eleanora: Are there any cases where the Standard-mandated behavior of treating ((∞,∞)²)² as some kind of infinity, rather than treating (∞,∞)² as (NaN,∞), and treating and (NaN,∞)² as (NaN,NaN), is actually useful? It seems to me that the Standard requires implementations to go out of their way to be less useful than they otherwise would be. – supercat Jun 18 '18 at 22:17

2 Answers2

20

The code is non-conforming.

Annex G, Section 5.1, Paragraph 4 reads

The * and / operators satisfy the following infinity properties for all real, imaginary, and complex operands:

— if one operand is an infinity and the other operand is a nonzero finite number or an infinity, then the result of the * operator is an infinity;

So if z = a * ib is infinite and w = c * id is infinite, the number z * w must be infinite.

The same annex, Section 3, Paragraph 1 defines what it means for a complex number to be infinite:

A complex or imaginary value with at least one infinite part is regarded as an infinity (even if its other part is a NaN).

So z is infinite if either a or b are.
This is indeed a sensible choice as it reflects the mathematical framework1.

However if we let z = ∞ + i∞ (an infinite value) and w = i∞ (and infinite value) the result for the Intel code is z * w = NaN + iNaN due to the ∞ · 0 intermediates2.

This suffices to label it as non-conforming.


We can further confirm this by taking a look at the footnote on the first quote (the footnote was not reported here), it mentions the CX_LIMITED_RANGE pragma directive.

Section 7.3.4, Paragraph 1 reads

The usual mathematical formulas for complex multiply, divide, and absolute value are problematic because of their treatment of infinities and because of undue overflow and underflow. The CX_LIMITED_RANGE pragma can be used to inform the implementation that (where the state is ‘‘on’’) the usual mathematical formulas [that produces NaNs] are acceptable.

Here the standard committee is trying to alleviate the huge mole of work for the complex multiplication (and division).
In fact GCC has a flag to control this behaviour:

-fcx-limited-range
When enabled, this option states that a range reduction step is not needed when performing complex division.

Also, there is no checking whether the result of a complex multiplication or division is NaN + I*NaN, with an attempt to rescue the situation in that case.

The default is -fno-cx-limited-range, but is enabled by -ffast-math.
This option controls the default setting of the ISO C99 CX_LIMITED_RANGE pragma.

It this option alone that makes GCC generate slow code and additional checks, without it the code it generate has the same flaws of Intel's one (I translated the source to C++)

f(std::complex<float>):
        movq    QWORD PTR [rsp-8], xmm0
        movss   xmm0, DWORD PTR [rsp-8]
        movss   xmm2, DWORD PTR [rsp-4]
        movaps  xmm1, xmm0
        movaps  xmm3, xmm2
        mulss   xmm1, xmm0
        mulss   xmm3, xmm2
        mulss   xmm0, xmm2
        subss   xmm1, xmm3
        addss   xmm0, xmm0
        movss   DWORD PTR [rsp-16], xmm1
        movss   DWORD PTR [rsp-12], xmm0
        movq    xmm0, QWORD PTR [rsp-16]
        ret

Without it the code is

f(std::complex<float>):
        sub     rsp, 40
        movq    QWORD PTR [rsp+24], xmm0
        movss   xmm3, DWORD PTR [rsp+28]
        movss   xmm2, DWORD PTR [rsp+24]
        movaps  xmm1, xmm3
        movaps  xmm0, xmm2
        call    __mulsc3
        movq    QWORD PTR [rsp+16], xmm0
        movss   xmm0, DWORD PTR [rsp+16]
        movss   DWORD PTR [rsp+8], xmm0
        movss   xmm0, DWORD PTR [rsp+20]
        movss   DWORD PTR [rsp+12], xmm0
        movq    xmm0, QWORD PTR [rsp+8]
        add     rsp, 40
        ret

and the __mulsc3 function is practically the same the standard C99 recommends for complex multiplication.
It includes the above mentioned checks.


1 Where the modulus of a number is extended from the real case |z| to the complex one ‖z‖, keeping the definition of infinite as the result of unbounded limits. Simply put, in the complex plane there is a whole circumference of infinite values and it takes just one "coordinate" to be infinite to get an infinite modulus.

2 The situation get worst if we remember that z = NaN + i∞ or z = ∞ + iNaN are valid infinite values

Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • Thank you for this. Can you give an explicit value for 'x' for which the ICC code gives a non conforming answer? My question is about squaring so it's not the general multiplication case. We don't have a 'z' and a different 'w' which we are multiplying together. – Simd Feb 05 '17 at 06:28
  • @eleanora When one component is NaN and the other is INFINITY you should get a wrong result. – Margaret Bloom Feb 05 '17 at 07:00
  • What should the square of such a number be according to C99? – Simd Feb 05 '17 at 09:24
  • @eleanora It should be INFINITY. I'm going to try a sample code later, to confirm this. Just to be sure (I'm a bit tired and I may have overlooked something). – Margaret Bloom Feb 05 '17 at 09:37
  • Thanks that would be great! – Simd Feb 05 '17 at 10:11
  • @eleanora If you use INF + i NAN as input for the ICC function the result is NaN + i NaN while it should be a complex number with at least one component infinite (per standard). GCC do it right. However, it is not easy to test these edge cases because ICC and (mine) GCC use two different conventions for the complex number and GCC. I tested the Intel function directly in assembly, while for GCC I had to generate INF + i NAN as (INF + I * 0) * (1 + I * INF) because INF + i * NAN got me a NAN + i NAN in GCC (since this is seen as creating a complex from a sum of a NaN imaginary and a real). – Margaret Bloom Feb 05 '17 at 14:25
  • This is very interesting. Is it possible to test both sets of assembly in the same way? For instance, could you test the gcc produced function directly in assembly too? As far as I know INF + i NAN should be equivalent to infinity as in the specs "A complex or imaginary value with at least one infinite part is regarded as an infinity (even if its other part is a NaN)." – Simd Feb 05 '17 at 16:41
  • @eleanora I could, but to what purpose? I've tested the Intel function using assembly because I don't have ICC and I couldn't simply plug it in as a separated object file assembled with NASM (due to the different handling of complex numbers, at least with -O0 to GCC). When we write INF + i * NAN in C we are asking to add two numbers, one purely real and infinite and one purely imaginary and NaN, thus INF + NaN = NaN. Simply put we are not defining a single complex number but two. I don't think there is a direct way to get INF + NaN. – Margaret Bloom Feb 05 '17 at 17:22
  • OK thanks. I am really just trying to be 100% confident that ICC is not conforming to the compulsory parts of the C99 specification. I am still not quite sure that the case you have found is definitely in the specs. – Simd Feb 05 '17 at 17:45
  • @eleanora It's always hard to tell what is conforming and what is not :) A thing you could do is ask Intel directly through their forum. I did something similar a couple of weeks ago. They may take a few days to answer but they are always nice and collaborative. – Margaret Bloom Feb 05 '17 at 19:54
  • Good idea. I will report back on what they say if they say anything. – Simd Feb 05 '17 at 20:31
  • @eleanora Thank you, I'd really like to hear what they have to say. – Margaret Bloom Feb 05 '17 at 21:01
  • 4
    Intel have accepted there is a bug it seems. "I can reproduce the issue you reported. There are two warnings which shows INFINITY*I is treated as an out of range number for complex type. This should be incorrect. I have escalated this issue and entered it in our problem tracking system. I will let you know when I have an update on this issue." – Simd Feb 09 '17 at 09:16
  • @eleanora Thanks, for reporting back! – Margaret Bloom Feb 09 '17 at 09:36
  • The logical mathematical interpretation of NaN would generally be a value which might be above the highest representable value, below the lowest representable value, or anywhere in-between. If X and Y are both unknown quantities which are higher than the largest representable value, then X-Y is NaN, because X+Y could be arbitrarily high or low. If one squares (X+Yi), one will get a value whose real part could be an arbitrarily large positive or negative number. If one squares that, both parts could be arbitrarily large positive or negative numbers. – supercat Jun 04 '18 at 16:57
  • Being able to interprets (NaN,+Inf) as "a value whose real part is unknown and whose imaginary part is very high", and likewise for the other three combinations of a NaN and an infinity, seems useful, but requiring that multiplication of such values must yield such a value will undermine that distinction. – supercat Jun 04 '18 at 17:04
10

I get similar, but not identical, code from clang 3.8 at -O2 -march=core-avx2 -ffast-math: I'm not deeply familiar with latter-day x86 floating point features, but I think it's doing the same calculation but using different instructions to shuffle values around in registers.

f:
        vmovshdup       %xmm0, %xmm1    # xmm1 = xmm0[1,1,3,3]
        vaddss  %xmm0, %xmm0, %xmm2
        vmulss  %xmm2, %xmm1, %xmm2
        vmulss  %xmm1, %xmm1, %xmm1
        vfmsub231ss     %xmm0, %xmm0, %xmm1
        vinsertps       $16, %xmm2, %xmm1, %xmm0 # xmm0 = xmm1[0],xmm2[0],xmm1[2,3]
        retq

GCC 6.3, with the same options, again appears to do the same calculation but shuffle values around in yet a third way:

f:
        vmovq   %xmm0, -8(%rsp)
        vmovss  -4(%rsp), %xmm2
        vmovss  -8(%rsp), %xmm0
        vmulss  %xmm2, %xmm2, %xmm1
        vfmsub231ss     %xmm0, %xmm0, %xmm1
        vmulss  %xmm2, %xmm0, %xmm0
        vmovss  %xmm1, -16(%rsp)
        vaddss  %xmm0, %xmm0, %xmm0
        vmovss  %xmm0, -12(%rsp)
        vmovq   -16(%rsp), %xmm0
        ret

Without -ffast-math, both compilers generate substantially different code that does appear to check for NaN, at least.

I conclude from this that Intel's compiler is not generating a fully IEEE-compliant complex multiplication even with -fp-model strict. There may be some other command-line switch that makes it generate fully IEEE-compliant code.

Whether or not this qualifies as a violation of C99 depends on whether Intel's compiler is documented to conform to Annex F and G (which specify what it means for an implementation of C to provide IEEE-compliant real and complex arithmetic), and if it is, what command-line options you have to give to get the conforming mode.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • I thought the strict option I used was meant to make it conforming but it would be great if an expert could say more about whether it actually does. – Simd Feb 04 '17 at 21:11