2

As mentioned in title, i'm wondering that is there any way to compile a microsoft style inline-assembly code (as showed below) in a linux OS (e.g. ubuntu).

_asm{
    mov edi, A;
    ....
    EMMS;
}

The sample code is part of a inline-assembly code which can be compiled successfully on win10 with cl.exe compiler. Is there any way to compile it on linux? Do i have to rewrite it in GNU c/c++ style (i.e. __asm__{;;;})?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
K.Xu
  • 39
  • 1
  • 8
  • 2
    Is it an entire function, or a lot of C/C++ code mixed in the same function? If the whole function is assembly, it may be easier to move it out of line (to an assembly code source file) than to rewrite to gcc inline assembly style (which is unique) – Ben Voigt Jul 24 '19 at 15:51
  • First of all, thank you for your suggestion. It is a *.cpp source file full of functions containing pure microsoft style inline-assembly code (i.e. \_asm{}). I have very little knowledge of assembly so i'm not sure that i got your point. Do you mean that i need to move these in-line assembly code to a *.s source file, and compile it with some linux assembly compiler (e.g. GAS NASM) to get *.o file that can be linked with gcc\g++? – K.Xu Jul 25 '19 at 02:37
  • 1
    @K.Xu: Ben was asking if your inline asm function had an asm block inside a function that also used some pure C, or if the entire body of the C function was just one big `_asm{ ... }` block. If the latter, turning it into a stand-alone asm function would be possible (perhaps with GAS `.intel_syntax noprefix` which is MASM-like), but then you have to deal with the calling convention and name-mangling for any C++ global variables. Like I said in my answer, if you do enough work to *understand* the asm, replacing it with intrinsics is probably your best bet. – Peter Cordes Jul 25 '19 at 02:52
  • @Peter Cordes: I think the latter case is suitable for me, the only content in my C function is one big \_asm{...} block. I was trying to find a way to port it from windows to linux, but now it seems that i'd better to rewrite it. – K.Xu Jul 25 '19 at 03:07

3 Answers3

1

First of all, you should usually replace inline asm (with intrinsics or pure C) instead of porting it. https://gcc.gnu.org/wiki/DontUseInlineAsm


clang -fasm-blocks is mostly compatible with MSVC's inefficient inline asm syntax. But it doesn't support returning a value by leaving it in EAX and then falling off the end of a non-void function.

So you have to write inline asm that puts the value in a named C variable and return that, typically leading to an extra store/reload making MSVC syntax even worse. (Pretty bad unless you're writing a whole loop in asm that amortizes that store/reload overhead of getting data into / out of the asm block). See What is the difference between 'asm', '__asm' and '__asm__'? for a comparison of how inefficient MSVC inline-asm is when wrapping a single instruction. It's less dumb inside functions with stack args when those functions don't inline, but that only happens if you're already making things inefficient (e.g. using legacy 32-bit calling conventions and not using link-time optimization to inline small functions).

MSVC can substitute A with an immediate 1 when inlining into a caller, but clang can't. Both defeat constant-propagation but MSVC at least avoids bouncing constant inputs through a store/reload. (As long as you only use it with instructions that can support an immediate source operand.)

Clang accepts __asm, asm, or __asm__ to introduce an asm-block. MSVC accepts __asm (2 underscores like clang) or _asm (more commonly used, but clang doesn't accept it).

So for existing MSVC code you probably want #define _asm __asm so your code can compile with both MSVC and clang, unless you need to make separate versions anyway. Or use clang -D_asm=asm to set a CPP macro on the command line.

Example: compile with MSVC or with clang -fasm-blocks

(Don't forget to enable optimization: clang -fasm-blocks -O3 -march=native -flto -Wall. Omit or modify -march=native if you want a binary that can run on earlier/other CPUs than your compile host.)

int a_global;

inline
long foo(int A, int B, int *arr) {
    int out;
    // You can't assume A will be in RDI: after inlining it prob. won't be
    __asm {
        mov   ecx, A                   // comment syntax
        add   dword ptr [a_global], 1
        mov   out, ecx
    }
    return out;
}

Compiling with x86-64 Linux clang 8.0 on Godbolt shows that clang can inline the wrapper function containing the inline-asm, and how much store/reload MSVC syntax entails (vs. GNU C inline asm which can take inputs and outputs in registers).

I'm using clang in Intel-syntax asm output mode, but it also compiles Intel-syntax asm blocks when it's outputting in AT&T syntax mode. (Normally clang compiles straight to machine-code anyway, which it also does correctly.)

## The x86-64 System V ABI passes args in rdi, rsi, rdx, ...
# clang -O3 -fasm-blocks -Wall
foo(int, int, int*):
        mov     dword ptr [rsp - 4], edi        # compiler-generated store of register arg to the stack

        mov     ecx, dword ptr [rsp - 4]        # start of inline asm
        add     dword ptr [rip + a_global], 1
        mov     dword ptr [rsp - 8], ecx        # end of inline asm

        movsxd  rax, dword ptr [rsp - 8]        # reload `out` with sign-extension to long (64-bit) : compiler-generated
        ret

Notice how the compiler substituted [rsp - 4] and [rsp - 8] for the C local variables A and out in the asm source block. And that a variable in static storage gets RIP-relative addressing. GNU C inline asm doesn't do this, you need to declare %[name] operands and tell the compiler where to put them.

We can even see clang inline that function twice into one caller, and optimize away the sign-extension to 64-bit because this function only returns int.

int caller() {
    return foo(1, 2, nullptr) + foo(1, 2, nullptr);
}
caller():                             # @caller()
        mov     dword ptr [rsp - 4], 1

        mov     ecx, dword ptr [rsp - 4]      # first inline asm
        add     dword ptr [rip + a_global], 1
        mov     dword ptr [rsp - 8], ecx

        mov     eax, dword ptr [rsp - 8]     # compiler-generated reload
        mov     dword ptr [rsp - 4], 1       # and store of A=1 again

        mov     ecx, dword ptr [rsp - 4]      # second inline asm
        add     dword ptr [rip + a_global], 1
        mov     dword ptr [rsp - 8], ecx

        add     eax, dword ptr [rsp - 8]     # compiler-generated reload
        ret

So we can see that just reading A from inline asm creates a missed-optimization: the compiler stores a 1 again even though the asm only read that input without modifying it.

I haven't done tests like assigning to or reading a_global before/between/after the asm statements to make sure the compiler "knows" that variable is modified by the asm statement.

I also haven't tested passing a pointer into an asm block and looping over the pointed-to array, to see if it's like a "memory" clobber in GNU C inline asm. I'd assume it is.

My Godbolt link also includes an example of falling off the end of a non-void function with a value in EAX. That's supported by MSVC, but is UB like usual for clang and breaks when inlining into a caller. (Strangely with no warning, even at -Wall). You can see how x86 MSVC compiles it on my Godbolt link above.


https://gcc.gnu.org/wiki/DontUseInlineAsm

Porting MSVC asm to GNU C inline asm is almost certainly the wrong choice. Compiler support for optimizing intrinsics is very good, so you can usually get the compiler to generate good-quality efficient asm for you.

If you're going to do anything to existing hand-written asm, usually replacing them with pure C will be most efficient, and certainly the most future-proof, path forward. Code that can auto-vectorize to wider vectors in the future is always good. But if you do need to manually vectorize for some tricky shuffling, then intriniscs are the way to go unless the compiler makes a mess of it somehow.

Look at the compiler-generated asm you get from intrinsics to make sure it's as good or better than the original.

If you're using MMX EMMS, now is probably a good time to replace your MMX code with SSE2 intrinsics. SSE2 is baseline for x86-64, and few Linux systems are running obsolete 32-bit kernels.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks so much ,you save my serveral hours. ```clang++ -c -fasm-blocks *.cpp``` works for me. All my functions containing inline-asm code is void function. For my case, maybe ```clang++``` is the only way other than rewriting it. – K.Xu Jul 25 '19 at 04:20
  • @K.Xu: Don't forget to enable optimization!!! `clang++ -O3 -march=native -fasm-blocks -flto *.cpp` makes a binary optimized for the machine you compiled on. (Use `-march=` something else or omit it to make a binary that will run on other CPUs.) LTO is link-time optimization for cross-file inlining. – Peter Cordes Jul 25 '19 at 04:24
  • Port MMX code to SSE2 code might give a speedup of a factor of 2 if it's something that can benefit from wider registers. And possibly a minor speedup just from porting from MMX asm to SSE2 intrinsics, unless the MMX asm happens to be very well tuned for current microarchitectures and it's one long-running loop so overhead of getting data in/out is low. – Peter Cordes Jul 25 '19 at 04:30
1

Is there any way to complie a microsoft style inline-assembly code on a linux platform?

Yes, it is possible. Kind of.

For GCC you have to use both Intel and AT&T syntax. It does not work with Clang due to Issue 24232, Inline assembly operands don't work with .intel_syntax and Issue 39895, Error: unknown token in expression using inline asm.

Here is the pattern. The assembler template uses .intel_syntax. Then, at the end of your asm template, you switch back to .att_syntax mode so it's in the right mode for the rest of the compiler-generated asm.

#include <cstddef>
int main(int argc, char* argv[])
{
    size_t ret = 1, N = 0;
    asm __volatile__
    (
        ".intel_syntax   noprefix ;\n"
        "xor esi, esi    ;\n"           // zero RSI
        "neg %1          ;\n"           // %1 is replaced with the operand location chosen by the compiler, in this case RCX
        "inc %1          ;\n"
        "push %1         ;\n"           // UNSAFE: steps on the red-zone
        "pop rax         ;\n"
        ".att_syntax     prefix ;\n"
        : "=a" (ret)      // output-only operand in RAX
          "+c" (N)        // read-write operand in RCX
        :                 // no read-only inputs
        : "%rsi"          // RSI is clobbered: input and output register constraints can't pick it
    );
    return (int)ret;
}

This won't work if you use any memory operands, because the compiler will substitute AT&T syntax 4(%rsp) into the template instead of [rsp + 4], for example.

This also only works if you don't compile with gcc -masm=intel. Otherwise you'll put the assembler into AT&T mode when GCC is emitting Intel syntax. So using .intel_syntax noprefix breaks your ability to use either syntax with GCC.


mov edi, A;

The code I help with does not use variables in the assembler like you show. I don't know how well (poorly?) it works with Intel style ASM. I know a MASM style-grammar is not supported.

You may be able to do it using asmSymbolicNames. See the GCC Extended ASM HowTo for details.

However, to convert to something GCC can consume, you only need to use positional arguments:

__asm__ __volatile__
(
    ".intel_syntax   noprefix ;\n"
    "mov edi, %0     \n";            // inefficient: use a "D" constraint instead of a mov
    ...
    ".att_syntax     prefix ;\n"
    : : "r" (A) : "%edi"
);

Or better, use a "D" constraint to ask for the variable in EDI / RDI in the first place. If a GNU C inline asm statement ever starts or ends with a mov, that's usually a sign you're doing it wrong.


Regarding asmSymbolicNames, here is what the GCC Extended ASM HowTo has to say about them:

This code makes no use of the optional asmSymbolicName. Therefore it references the first output operand as %0 (were there a second, it would be %1, etc). The number of the first input operand is one greater than that of the last output operand. In this i386 example, that makes Mask referenced as %1:

uint32_t Mask = 1234;
uint32_t Index;

  asm ("bsfl %1, %0"
     : "=r" (Index)
     : "r" (Mask)
     : "cc");

That code overwrites the variable Index (‘=’), placing the value in a register (‘r’). Using the generic ‘r’ constraint instead of a constraint for a specific register allows the compiler to pick the register to use, which can result in more efficient code. This may not be possible if an assembler instruction requires a specific register.

The following i386 example uses the asmSymbolicName syntax. It produces the same result as the code above, but some may consider it more readable or more maintainable since reordering index numbers is not necessary when adding or removing operands. The names aIndex and aMask are only used in this example to emphasize which names get used where. It is acceptable to reuse the names Index and Mask.

uint32_t Mask = 1234;
uint32_t Index;

  asm ("bsfl %[aMask], %[aIndex]"
     : [aIndex] "=r" (Index)
     : [aMask] "r" (Mask)
     : "cc");

The sample code is part of a inline-assembly code which can be compiled successfully on win10 with cl.exe compiler...

Stepping back to 10,000 feet, if you are looking for something easy to use to integrate inline ASM like in Microsoft environments, then you don't have it on Linux. GCC inline ASM absolutely sucks. The GCC inline assembler is an archaic, difficult to use tool that I despise interacting with.

(And you have not experienced the incomprehensible error messages with bogus line information, yet).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
jww
  • 97,681
  • 90
  • 411
  • 885
  • 1
    Your first example is broken. You tell the compiler you have a read-only input `"c" (N)` in ECX/RCX, but your asm template modifies `%1`. You just lied to the compiler, and some future attempt to use `N` or something with the same value might or might not instead use `~N`. Also, for x86-64 Linux this code clobbers the red-zone because you used `push` without `sub $128, %rsp` first. [Using base pointer register in C++ inline asm](//stackoverflow.com/q/34520013) – Peter Cordes Jul 24 '19 at 20:29
  • Thanks Peter. Yeah, that sounds about right for this tool. – jww Jul 24 '19 at 20:41
  • I agree that GNU C inline asm is difficult to use, but I'd hardly call it archaic. It has the amount of complexity that's required to accurately describe a black-box instruction or block to the compiler so it can optimize around it. MSVC doesn't have this and is garbage for wrapping single instructions, forcing a round trip through memory for the inputs. (At least the way MSVC compiles it.) Apparently MSVC inline asm is even broken/unsafe for functions with register args!! See [Ross's comment](https://stackoverflow.com/questions/3323445/what#comment59576185_35959859) – Peter Cordes Jul 24 '19 at 20:58
  • Thanks for your suggestion, Peter's idea solved my problem. – K.Xu Jul 25 '19 at 04:28
0

Peter's idea solved my problem. I just added a macro into my source file in which all functions consist of single big inline-asm block of intel syntax. The macro is showed below:

#define _asm\
        asm(".intel_syntax noprefix\n");\
        asm\

After that i compiled it with command:

clang++ -c -fasm-blocks source.cpp

Then everthing is OK.

K.Xu
  • 39
  • 1
  • 8
  • You don't need and shouldn't ever use `asm(".intel_syntax noprefix\n");` for gcc or clang. That just breaks your code for MSVC, and probably breaks any GNU C asm *statements* in your asm, as well as breaking clang's asm output if you compile with `-S`. The example in my questions compiles with both MSVC and clang because asm-blocks are implicitly in Intel-syntax; that's why I didn't mention `.intel_syntax noprefix` in my answer. (But note the two underscores; MSVC apparently accepts both, but clang accepts either `asm` or `__asm` or `__asm__`, not `_asm`.) – Peter Cordes Jul 25 '19 at 04:40
  • 1
    **Use `clang++ -D_asm=asm -fasm-blocks -O3 -march=native -flto -c src.cpp`**. (Or put `#define _asm __asm` in your source file, optionally inside an `#ifdef __clang__`) – Peter Cordes Jul 25 '19 at 04:42
  • @Peter Cordes. Thanks, as your suggestion, i found that ```asm(".intel_syntax noprefix\n")``` is unnessary. But when i add ```-flto``` option into the compile command, the output file (i.e. *.o) can not be recognized and compilter reported an error ```file not recognized: File format not recognized```. I tried to use ```nm``` to analyse the content of this *.o file, but got the same error. And everything is back to OK after removing this option ```-flto```. – K.Xu Jul 25 '19 at 05:37
  • 1
    Works for me on Arch Linux. Are you trying to link with GCC or something? LTO is link-time optimization: the `.o` files are really just intermediate representation ready for clang to do whole-program optimization. `file *.o` says `LLVM IR bitcode`. `-flto` only works if you compile *and* link with `clang++` ; gcc won't know what to do with it. (But that's fine, clang is a very good compiler, usually pretty much equal with gcc. Or better if its default loop unrolling helped any of your important loops.) – Peter Cordes Jul 25 '19 at 05:43
  • I made a mistake. Compiler is clang, but I forgot to add ```-flto``` for other source file when compiling other *.o files. I guess that's why it can not be recognized in the end. – K.Xu Jul 25 '19 at 05:53
  • 1
    If you're running clang++ manually, you can always just run `clang++ ... *.cpp` to compile+link in one step. (With `-flto` and other optimization options). Otherwise set CFLAGS + CXXFLAGS + LDFLAGS in a Makefile or whatever to only rebuild changed files. – Peter Cordes Jul 25 '19 at 05:57
  • Be careful of `-flto` with Clang. We found it is completely broken on all platforms we tested: [Link Time Optimization](https://www.cryptopp.com/wiki/Link_Time_Optimization). It looks like some sort of problem driving the linker. Bug report at Issue 42684, [LTO and error adding symbols: archive has no index; run ranlib to add one](https://bugs.llvm.org/show_bug.cgi?id=42684). – jww Jul 25 '19 at 06:03