Why is __muldc3 called, when two std::complex are multiplied?

Question

I naively assumed, that the complex number multiplication would be inlined by the compiler, for example for this function:

#include <complex>

void mult(std::complex<double> &a, std::complex<double> &b){
    a*=b;
}

However, when complied by gcc (with -O2), the resulting assembler is surprising (at least for me):

mult(std::complex<double>&, std::complex<double>&):
        pushq   %rbx
        movsd   8(%rdi), %xmm3
        movsd   (%rdi), %xmm2
        movq    %rdi, %rbx
        movsd   8(%rsi), %xmm1
        movsd   (%rsi), %xmm0
        call    __muldc3
        movsd   %xmm0, (%rbx)
        movsd   %xmm1, 8(%rbx)
        popq    %rbx
        ret

There is a call to this function __multdc3, which somehow replaced the call to the operator*= (its mangled name would be _ZNSt7complexIdEmLIdEERS0_RKS_IT_E and the complex number would be passed per reference).

However, there seems to be nothing special in the implementation of the operator*= which would explain the magic:

// 26.2.5/13
  // XXX: This is a grammar school implementation.
  template<typename _Tp>
    template<typename _Up>
    complex<_Tp>&
    complex<_Tp>::operator*=(const complex<_Up>& __z)
    {
      const _Tp __r = _M_real * __z.real() - _M_imag * __z.imag();
      _M_imag = _M_real * __z.imag() + _M_imag * __z.real();
      _M_real = __r;
      return *this;
}

I must be missing something, thus my question: What is the reason for the resulting assembler?

score 8 · Accepted Answer · answered Mar 22 '18 at 21:13

You should note that it is, strictly speaking, "wrong" to implement the complex floating point multiplication by the formula

(a+i*b)*(c + i*d) = a*c - b*d + i*(b*c + a*d)

I write wrong in quotes, because the C++ standard does not actually require correct implementation. C does specify it in the some appendix.

The simple implementation does not give correct results with Inf and/or NaN in the input.

consider (Inf + 0*i)*(Inf + 0*i): Clearly, for consistent behavior, the result should be the same as for real floating point, namely Inf, or (Inf + 0*i), respectively. However, the formula above gives Inf + i*NaN.

So I could imagine that __muldc3 is a longer function that handles these cases correctly.

When the call goes away with -ffast-math then this is most likely the explanation.

score 3 · Answer 2 · answered Mar 22 '18 at 21:08

3

At least for plain C, gcc follows the somewhat crazy C99 annex f (?) rules for complex multiply/divide; perhaps for c++ as well. Try -fast-math or -fcx-fortran-rules.

answered Mar 22 '18 at 21:08

janneb

36,249
2
81
97

ead · Answer 3 · 2023-06-21T06:12:33.623

The answer of Andreas H. gives the direction, in this answer I just try to connect the dots.

First, I was wrong about the definition of the operator*=: there is a specialization for floating types, for example for doubles this operator defined as:

  template<typename _Tp>
    complex&
    operator*=(const complex<_Tp>& __z)
{
  _ComplexT __t;
  __real__ __t = __z.real();
  __imag__ __t = __z.imag();
  _M_value *= __t;
  return *this;
}

With _ComplexT defined as

typedef __complex__ double _ComplexT;

and _M_value defined as

private:
   _ComplexT _M_value;

That means, the school-formula is used for such types as std::complex<int> but for the floating types it boils down to C-multiplication (__complex__ is a gcc-extension), which is a part of the standard since C99, appendix G (most important parts are at the end of the answer).

One of the problematic cases would be:

(0.0+1.0*j)*(inf+inf*j) = (0.0*inf-1*inf)+(0.0*inf+1.0*inf)j
                        =  nan + nan*j

Due to the convention (G.3.1) the first factor is a nonzero, finite and the second is infinite, thus due to (G.5.1.4) the result must be finite, which is not the case for nan+nan*j.

The strategy of the calculation in __multdc3 is simple: use the school formula, and if this doesn't work (both are nan) a more complicated approach will be used - which is just too complex to be inlined. (By the way, clang inlines the school formula and calls __multdc3 if this formula didn't work).

Another problem could be that one product overflow/underflow but the sum of two products doesn't. This is probably covered by the formulation "may raise spurious floating-point exceptions", but I'm not 100% sure.

The section G.3.1 states:

A complex or imaginary value with at least one infinite part is regarded as an infinity (even if its other part is a NaN).

A complex or imaginary value is a finite number if each of its parts is a finite number (neither infinite nor NaN).

A complex or imaginary value is a zero if each of its parts is a zero.

Section G.5 is concerned with binary operators and states:

For most operand types, the value of the result of a binary operator with an imaginary or complex operand is completely determined, with reference to real arithmetic, by the usual mathematical formula. For some operand types, the usual mathematical formula is problematic because of its treatment of infinities and because of undue overflow or underflow; in these cases the result satisfies certain properties (specified in G.5.1), but is not completely determined.

The section G.5.1 is concerned with multiplicative operators and states:

The * and / operators satisfy the following infinity properties for all real, imaginary,and complex operands:

—if one operand is an infinity and the other operand is a nonzero finite number or an infinity,then the result of the operator is an infinity;

...

If both operands of the * operator are complex or if the second operand of the / operator is complex, the operator raises floating-point exceptions if appropriate for the calculation of the parts of the result, and may raise spurious floating-point exceptions.

score 0 · Answer 4 · answered Mar 22 '18 at 20:57

0

Possibly because software floating point is on by default. As per the man page for gcc:

-msoft-float
This option ignored; it is provided for compatibility purposes only.    
software floating point code is emitted by default,
and this default can overridden by FPX options; mspfp, mspfp-compact, 
or mspfp-fast for single precision, and mdpfp, mdpfp-compact, or mdpfp-fast 
for double precision.

Try compiling with -mdpfp or any of the other alternatives.

answered Mar 22 '18 at 20:57

selbie

100,020
15
103
173

I don't understand, how it explains the observed behavior. I also don't have a problem with the way it is - I just would like to know, why. I guess there are good reasons, why __multdc3 is used. – ead Mar 22 '18 at 21:01
Because `__multdc3` is part of [GCC's software floating point library](https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html). It's going to get used whenever hardware floating point is not used by the compiler, which is the default unless you use one of the other command line options listed above. – selbie Mar 22 '18 at 21:06
I also suspect that your compiler's default arch might be `i386`. In which case it may not generate native FP instructions because there were variations of 80386 that didn't have floating point (remember the 386sx ?) I suspect, assuming your on intel, if you added `-march pentium` to your compiler command line, you might get the native floating point. All of this is just a guess by the way. – selbie Mar 22 '18 at 21:09
I'm on x86-64, and I think my version of gcc doesn't even know -mdpfp & Co. – ead Mar 22 '18 at 21:13
`__muldc3` is defined in `libgcc2.c` and it's used in both hardware and soft float modes. – Karol S Oct 20 '21 at 16:53

Why is __muldc3 called, when two std::complex are multiplied?

4 Answers4

Linked