0

I want to calculate a sum of elements of an array using GCC inline assembly as an exercise. I need to access the elements. I tried this code:

#include <stdio.h>
#include <stdlib.h>


int main(int argc) {
    unsigned n = 4;
    unsigned* a = malloc(sizeof(unsigned) * n);
    unsigned s;

    a[0] = 4;
    a[1] = 1;
    a[2] = 5;
    a[3] = 7;

    __asm__ (
        ".text;"
        "   mov %[n], %%ecx;"
        "   mov $0, %%eax;"
        "   mov $0, %%ebx;"
        "l1:;"
        "   add %[a][%%ebx], %%eax;"
        "   add $1, %%ebx;"
        "   loop l1;"
        "   mov %%eax, %[s];"
        : [s] "=r" (s)
        : [a] "r" (a), [n] "r" (n)
    );

    printf("%u\n", s);

    free(a);

    return 0;
}

It gives the error:

main.c: Assembler messages:
main.c:15: Error: junk `[%ebx]' after register

Obviously the line add %[a][%%ebx], %%eax; is wrong. How should I modify it?

Also I would be happy to get some recommendations about optimization of this code.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Fomalhaut
  • 8,590
  • 8
  • 51
  • 95
  • 1
    the `a` is the pointer-to-array, not array. `add [a]...` is adding the pointer (memory address) to eax. You need first to load the pointer into some spare register, like you can force the inline assembly to start with value `a` in register `ebx` (mark it as input), and then `add (%%ebx),%%eax` to add value from array .. and `add $4, %%ebx` to advance by one element, it's memory address, not index. `add $1,..` is not enough. I'm not giving exact source changes, because I don't know the gcc inline asm rules. – Ped7g Aug 17 '18 at 16:21
  • Optimization: don't use inline asm for that, write it in pure C so the compiler can evaluate it at compile time. Or for non-constant inputs, so it can use SIMD. https://gcc.gnu.org/wiki/DontUseInlineAsm. If you insist on asm, definitely avoid [the slow `loop` instruction](https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently). Also, use a local array like `int a[4];` instead of calling `malloc`. – Peter Cordes Aug 18 '18 at 03:33
  • Also, you clobber EAX, ECX, and EBX without telling the compiler about it. Just use `%[s]` as your accumulator, because you already asked for a register. Just so many missed optimizations here, look at compiler output for a function like `int sum(int *p, int len){...}` (so it can't optimize at compile time). Use `gcc -O2` or `-O3 -fno-tree-vectorize` to get scalar loops. [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116). – Peter Cordes Aug 18 '18 at 03:36

0 Answers0