2
inline float sqrt2(float sqr)
{
    float root = 0;

    __asm
    {
    sqrtss xmm0, sqr
    movss root, xmm0
    }

    return root;
}

here is MSVC compilator inline assembly which I want to compile with gcc x86, what I know that gcc inline assembly is getting called with asm("asm here"); but I completely don't know how to include parameter in that, the result is obtained by "=r" I know only.

Which should result in something like that:

asm("sqrtss xmm0, %1\n\t"
        "movss %0, xmm0"
        : "=r" (root)
        : "r" (sqr));
  • 7
    Do you really need this? the stdlib version should automatically be turned into this by the compiler. – Mgetz Apr 26 '18 at 12:07
  • This is the simplest example I have. Of course I can change that into standard sqrt but thats method is faster. I have more assembly to convert so I want to learn on this one – kermit esea Apr 26 '18 at 12:21
  • 1
    Before trying to outsmart the compiler with assembly code, you might want to read this example: [More efficient assembly code?](https://stackoverflow.com/questions/43509578/more-efficient-assembly-code) – Bo Persson Apr 26 '18 at 12:32
  • 2
    @kermitesea I'd be almost willing to bet the [compiler can beat your assembly](https://stackoverflow.com/q/9601427/332733) in almost all cases on -O2. In the cases it can't I would highly suggest profiling first before resorting to assembly. – Mgetz Apr 26 '18 at 13:06

1 Answers1

3

The r constraint is for general purpose registers. x is for xmm. Consult the manual for more details. Also, if you are using a mov in inline asm, you are likely doing it wrong.

inline float sqrt2(float sqr)
{
    float root = 0;

    __asm__("sqrtss %1, %0" : "=x" (root) : "x" (sqr));

    return root;
}

Note that gcc is entirely capable of generating sqrtss instruction from sqrtf library function call. You may use -fno-math-errno to get rid of some minor error checking overhead.

Jester
  • 56,577
  • 4
  • 81
  • 125
  • Thanks so much for very strong informations. Definitely gonna check manual ty :) – kermit esea Apr 26 '18 at 12:29
  • Note that if you want to use the native FMA functions in GCC clang that come as part of AVX and AVX2 you need to do the `-mfma` flag. Then the compiler will gladly replace `std::fma` with the appropriate instruction – Mgetz Apr 26 '18 at 13:44