0

I'm catching compile problems using Clang. GCC compiles the program fine. The program uses indexed addressing.

The errors are:

$ clang++ -g2 -O1 test.cxx -c
test.cxx:19:10: error: invalid operand for instruction
        "movq     (%[idx],%[in]), %[x]   ;\n"
         ^
<inline asm>:5:23: note: instantiated into assembly here
movq     (%rbx,%rsi), -8(%rsp)   ;
                      ^~~~~~~~
test.cxx:20:10: error: invalid operand for instruction
        "movq     (%[idx],%[out]), %[y]  ;\n"
         ^
<inline asm>:6:23: note: instantiated into assembly here
movq     (%rbx,%rdi), -16(%rsp)  ;
                      ^~~~~~~~~
test.cxx:21:10: error: invalid operand for instruction
        "cmovnzq  %[x], %[y]             ;\n"  // copy in to out if NZ
         ^
<inline asm>:7:20: note: instantiated into assembly here
cmovnzq  -8(%rsp), -16(%rsp)             ;
                   ^~~~~~~~~
test.cxx:22:10: error: invalid operand for instruction
        "movq     %[y], (%[idx],%[out])  ;\n"
         ^
<inline asm>:8:21: note: instantiated into assembly here
movq     -16(%rsp), (%rbx,%rdi)  ;
                    ^~~~~~~~~~~
4 errors generated.

How do I fix the problem? (Or how do I tell Clang to stop defining __GNUC__ to keep it out of GCC code paths).


$ cat test.cxx
#include <iostream>
#include <algorithm>
#include <cstring>
#include <cstdint>

void test_cmov(uint8_t in[96], uint8_t out[96], uint64_t flag)
{
#if defined(__GNUC__)
    const uint32_t iter = 96/sizeof(uint64_t);
    uint64_t* optr = reinterpret_cast<uint64_t*>(out);
    uint64_t* iptr = reinterpret_cast<uint64_t*>(in);
    uint64_t idx=0, x, y;

    __asm__ __volatile__ (
        ".att_syntax                     ;\n"
        "cmpq     $0, %[flag]            ;\n"  // compare, set ZERO flag
        "movq     %[iter], %%rcx         ;\n"  // load iteration count
        "1:                              ;\n"
        "movq     (%[idx],%[in]), %[x]   ;\n"
        "movq     (%[idx],%[out]), %[y]  ;\n"
        "cmovnzq  %[x], %[y]             ;\n"  // copy in to out if NZ
        "movq     %[y], (%[idx],%[out])  ;\n"
        "leaq     8(%[idx]), %[idx]      ;\n"  // increment index
        "loopnz   1b                     ;\n"  // does not affect flags
        : [out] "+D" (optr), [in] "+S" (iptr), [idx] "+b" (idx),
          [x] "=g" (x), [y] "=g" (y)
        : [flag] "g" (flag), [iter] "I" (iter)
        : "rcx", "cc"
    );
#else
    if (flag)
        std::memcpy(out, in, 96);
#endif
}

int main(int argc, char*argv[])
{
    uint8_t in[96], out[96];
    uint64_t flag = (argc >=2 && argv[1][0] == 'y');

    std::memset(in, 0x00, 96);
    std::memset(out, 0x00, 96);

    std::memcpy(in, argv[0], std::min(96ul, std::strlen(argv[0])));

    test_cmov(in, out, flag);
    std::cout << (const char*)out << std::endl;

    return 0;
}

$ gcc --version
gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2)
...

$ clang --version
clang version 7.0.1 (Fedora 7.0.1-6.fc29)
Target: x86_64-unknown-linux-gnu
...

$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: Fedora
Description:    Fedora release 29 (Twenty Nine)
Release:        29
Codename:       TwentyNine
jww
  • 97,681
  • 90
  • 411
  • 885
  • Use `&& !defined(__clang__)` if you want to exclude clang. But seriously, why would you ever expect this inefficient loop to be faster than `if(flag)std::memcpy`? Possibly not too much slower on AMD CPUs where `loop` is not slow, but see for Intel see [Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?](//stackoverflow.com/q/35742570) **And beware that it introduces a non-atomic read/rewrite of `out` if `flag` is false**, unlike the pure C version. This isn't an optimization compiler would be allowed to do in the general case because it invents writes. – Peter Cordes Apr 30 '19 at 17:20
  • *"why would you ever expect this inefficient loop to be faster than if(flag)std::memcpy"* - I don't. It is not my goal for this code. – jww Apr 30 '19 at 17:23

1 Answers1

2

You used "=g" constraints that let the compiler pick a mem operand for %[x] and %[y].

Use "=r" instead.

Your template uses movq (%[idx],%[in]), %[x], which obviously fails to assemble if %[x] is memory, because x86 doesn't support 2 explicit memory operands for any instruction.

The clang difference from gcc here is that it likes to pick memory operands if given the choice. (This is an optimizer bug, IMO, but not a correctness problem. Your inline asm is buggy and only happens to work with GCC because it picks a register for [x] "=g" (x))

This is obvious if you read the error messages:

<inline asm>:5:23: note: instantiated into assembly here
movq     (%rbx,%rsi), -8(%rsp)

...

cmovnzq  -8(%rsp), -16(%rsp)

Obviously these are not valid instructions.


If you care about clang, generally avoid giving it the choice of a memory operand for inline asm unless it will definitely help in the normal case.

When you write a whole loop in inline asm, you definitely want to make the compiler spill something else to free up registers if necessary for some loop temporaries. Or really any time you use and operand multiple times in the same inline asm block. GCC doesn't look at this, and won't know the cost of choosing memory. (And clang is just dumb and chooses memory even when there are lots of free regs.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks. The manual says we are supposed to give GCC the most choices so it has the freedom to do what it wants. I often use `g` to ensure GCC has its choices. The compiler should only need to select one memory operand and one register. – jww Apr 30 '19 at 17:33
  • 1
    @jww: It should be obviously that you shouldn't give the compiler freedom to choose something that's incompatible with your template. Other than that, for a single instruction that's good advice, but **when you write a whole loop you definitely want to make the compiler spill something else to free up registers if necessary** for some loop temporaries or other operands that you use multiple times in the same inline asm block. The compiler doesn't know this, and won't know the cost of choosing memory. – Peter Cordes Apr 30 '19 at 17:37