2

Considering the following assembly code loop:

#include <iostream>

#define ADD_LOOP(i, n, v)       \
asm volatile (                  \
    "movw %1, %%cx      ;"      \
    "movq %2, %%rax     ;"      \
    "movq $0, %%rbx     ;"      \
    "for:               ;"      \
    "addq %%rax, %%rbx  ;"      \
    "decw %%cx          ;"      \
    "jnz for            ;"      \
    "movq %%rbx, %0     ;"      \
    : "=x"(v)                   \
    : "n"(i), "x"(n)            \
    : "%cx", "%rax", "%rbx"     \
);

int main() {
    uint16_t iter(10000);
    uint64_t num(5);
    uint64_t val;

    ADD_LOOP(iter, num, val)

    std::cout << val << std::endl;

    return 0;
}

Is possible to call a C function (or it's machine code output) from within a loop as specified above?

for example:

#include <wmmintrin.h>

int main() {
    __m128i x, y;

    for(int i = 0; i < 10; i++) {
        x = __builtin_ia32_aesenc128(x, y);
    }

    return 0;
}

Thanks

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Abdul Ahad
  • 826
  • 8
  • 16
  • 3
    Of course, yes. It's all binary instruction code, so it can be called, (permissions/privileges etc. allowing). – Martin James Dec 21 '17 at 14:39
  • 2
    Please elaborate a bit more on what you want to achieve. Do you want to call C function from assembly, or do you want to invoke and intrinsic in assembly? To call a function you just need to follow C ABI for the selected platform. Intrinsics on the other hand are not functions but the way to make compiler generate some platform specific instructions (like memory barriers, atomic instructions, various vector extensions, etc.), hence they are not to be called but rather to be replaced with the assembly itself. – tgregory Dec 21 '17 at 14:42
  • You're using C++. Maybe it is very C-style C++, but it is still C++. – klutt Dec 21 '17 at 14:46
  • I'll research C ABI, thanks. I'm not an expert on what you are referring to, but I want to call an intrinsic from inside the jnz loop without leaving the asm function, if possible – Abdul Ahad Dec 21 '17 at 14:46
  • yeah, I it's C++, you're right. I just have a habit of saying C. I'm basically just using it for std strings and what not – Abdul Ahad Dec 21 '17 at 14:48
  • 1
    I understand your point, but you still need a c++ compiler, which enforces c++ and some things are different. Like that c has implicit casting from void pointers. – klutt Dec 21 '17 at 15:03
  • I didn't know that, thanks – Abdul Ahad Dec 21 '17 at 15:04
  • `aesenc` is an assembly instruction (for some CPUs). Can you not just use it in your asm code? – Bo Persson Dec 21 '17 at 15:07
  • 3
    This edit turns it into a totally different question. Now it's about debugging GNU C inline asm constraints, not about how to use `__builtin` functions from inline asm. That should have been a new question. – Peter Cordes Jan 18 '18 at 04:37

2 Answers2

10

No. Builtin functions aren't real functions that you can call with call. They always inline when used in C / C++.

For example, if you want int __builtin_popcount (unsigned int x) to get either a popcnt instruction for targets with -mpopcnt, or a byte-wise lookup table for targets that don't support the popcnt instruction, you are out of luck. You will have to #ifdef yourself and use popcnt or an alternative sequence of instructions.


The function you're talking about, __builtin_ia32_aesenc128 is just a wrapper for the aesenc assembly instruction which you can just use directly if writing in asm.


If you're writing asm instead of using C++ intrinsics (like #include <immintrin.h>) for performance, you need to have a look at http://agner.org/optimize/ to write more efficient asm. e.g. use %ecx as a loop counter, not %cx. You're gaining nothing from using a 16-bit partial register.

You could also write more efficient inline-asm constraints, e.g. the movq %%rbx, %0 is a waste of an instruction. You could have used %0 the whole time instead of an explict %rbx. If your inline asm starts or ends with a mov instruction to copy to/from an output/input operand, usually you're doing it wrong. Let the compiler allocate registers for you. See the tag wiki.

Or better, https://gcc.gnu.org/wiki/DontUseInlineAsm. Code with intrinsics typically compiles well for x86. See Intel's intrinsics guide: #include <immintrin.h> and use __m128i _mm_aesenc_si128 (__m128i a, __m128i RoundKey). (In gcc that's just a wrapper for __builtin_ia32_aesenc128, but it makes your code portable to other x86 compilers.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • thank you. I'll read the links you sent about optimization. I didn't know that about %0, and I'll look at it, but I don't think it would be a problem in my case because there are millions of instructions in the loop. You're probably right that intrinsics compile well, but I don't mind writing the assembly and performing an actual measurement of execution time versus the C. It probably won't be any faster but I'd like to see the measurement comparison. – Abdul Ahad Dec 21 '17 at 15:26
  • 1
    @AbdulAhad: Usually C++ with intrinsics is more maintainable. Unless you're an asm performance expert, you should probably do that. Or if you're just experimenting with asm for fun, and want to see if you can beat the compiler, then have a look at [Why is this C++ code faster than my hand-written assembly for testing the Collatz conjecture?](https://stackoverflow.com/questions/40354978/why-is-this-c-code-faster-than-my-hand-written-assembly-for-testing-the-collat). (Hint: start with optimized compiler output and improve on that.) – Peter Cordes Dec 21 '17 at 15:29
  • Note that this answer was written before the edit that changed the question into a different one about getting the constraints + other stuff right so the OP's inline asm will compile and assemble. – Peter Cordes Jan 18 '18 at 07:24
3

Answer to your question may be split in two parts.

It is defenetly possible to call a C function from Assembly. To do so you need to follow a calling convention (which is described in ABI documents) which specifies how to pass arguments and get return values. Remember that you have registers, stack and memory to move data around.

Intrinsics however even, though they look like a C function are not functions. You may look at C as a somewhat high level assembly which works on a wide variety of architectures. In some cases you want to take an advantage of your specific architecture instruction set, hence compiler provides you with the way to do so via the means of intrinsics. Each intrinsic is mapped to some architecture specific assembly instructions. So in the end of the day you do not need to call them from assembly but rather need to find the instruction itself, for instance I expect __builtin_ia32_aesenc128 to be replaced with AESENC instruction.

tgregory
  • 554
  • 3
  • 9
  • 3
    It's non-trivial to make function calls from an inline-asm statement. For example, in the x86-64 System V ABI, there's a 128-byte red-zone below `rsp` which the compiler assumes you don't clobber. So you [can't safely use `push` or `call` in 64-bit code](https://stackoverflow.com/questions/34520013/using-base-pointer-register-in-c-inline-asm/34522750). Also https://stackoverflow.com/questions/6380992/inline-assembly-that-clobbers-the-red-zone. – Peter Cordes Dec 21 '17 at 15:24
  • 1
    @PeterCordes Good point. Splitting the code into pure assembly and pure C across the function call boundary may be the way to go. – tgregory Dec 21 '17 at 15:55