1

I'm trying to write a function in Assembly that will return the sum of the even elements in an array. This is my code

       int sum(int *a, int n)
       {
          int S = 0;
          _asm {

          mov eax, [ebp + 8]
          mov edx, 0
          mov ebx, [ebp + 12]
          mov ecx, 0
          for1: cmp ecx, ebx
          jge endfor
          and [eax + ecx * 4], 1
          jz even
          inc ecx
          jmp for1

          even: add edx, [eax + ecx * 4]            
          inc ecx
          jmp for1
          endfor: mov S, edx
          }
      return S;
      }

But it's not working. Does anybody know what it's wrong and how can I fix it? Thanks.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Andreea
  • 67
  • 9

2 Answers2

4

Just some guesswork (especially as I don't know what compiler you are using, and you weren't clear about what you meant by "not working"):

      mov eax, [ebp + 8]
      mov edx, 0
      mov ebx, [ebp + 12]
      mov ecx, 0

I assume this is supposed to load the two parameters a and n into registers. How do you know the offsets are correct? Is it possible to simply refer to the names directly?

      and [eax + ecx * 4], 1

This destroys the element in the input array (setting it to 1 if odd or 0 if even), which you probably don't want. You should probably use the test instruction (which is non-destructive) instead of and.

      even: add edx, [eax + ecx * 4]

This will add 0, since you have set [eax + ecx * 4] to 0 via the and instruction which I mention above. Based on this, I would expect your function to always return 0.

davmac
  • 20,150
  • 1
  • 40
  • 68
  • It loads correctly the parameters into registers.The part that is not working is the even test part. I'm a beginner so I don't really now that much of Assembly yet. Can you please tell me how can I check if a number is even and so add them to return the sum of all even elements in the array/? Thank you – Andreea Jan 13 '16 at 22:58
  • @Andreea An odd number will have the least significant bit set to `1`. Therefore, an even number will have the least significant bit cleared to `0`. You can use a bitwise AND with `1` to check the parity of a number. But it seems like you already know that, looking at your code... – Levi Jan 14 '16 at 02:26
  • @Andreea the `and` instruction does check if a number is even or odd. The problem is that it also destroys the original value of the first operand, which in your code, is the array element. You need to pull the element value out of the array before you apply the `and` instruction to it. (This is _all already mentioned_ in my answer... if there's something in it that you don't understand, then please ask about that explicitly rather than just rephrasing the original question). – davmac Jan 14 '16 at 10:42
  • 1
    @davmac **"You should probably load the element into a register or a local variable and test that instead."** You do know the X86 has the `test` instruction to do just that? – Sep Roland Feb 28 '16 at 22:27
  • 1
    @user3144770 tbh it had slipped my mind. The actual point I was making, that `AND` destroys the value, still stands. I'll edit the answer to suggest using `TEST`. – davmac Feb 29 '16 at 11:20
  • I was going to leave a comment somewhere about how clunky the OP's asm was, and ended up writing a whole answer. But yes, as @user3144770 suggests, `test` is the way to go to non-destructively test things. `bt` is also an option, to test bit zero (setting or clearing `CF`), but `test` can macro-fuse with a `jcc`. – Peter Cordes Feb 29 '16 at 12:00
  • BTW, `and [eax + ecx * 4], 1` won't even assemble, since the operand-size is ambiguous. But yes, even with `dword ptr` it wouldn't be what you want. – Peter Cordes May 16 '22 at 13:23
0

Why even use inline asm at all if you want to grab the args off the stack yourself? Either let the compiler give you the args (so it still works after inlining), or write pure asm in a separate file.

Here's a version of your function written nicely. Note the indenting, and the more efficient code (branch structure, and loading into a register instead of comparing in memory and then using as a memory operand for add. If the condition was expected to be very unlikely, a test or cmp with a memory operand would make sense, though.)

int sum(const int *a, int n)
{
  int S;

  _asm {
      mov    ecx, n
      xor    eax,eax              // Sum = 0
      test   ecx,ecx
      jz   .early_out             // n==0 case

      mov    esi, a
      lea    edi, [esi + 4*ecx]   // end pointer = &a[n]
      xor    ecx,ecx              // zeroed register for CMOV
  sum_loop:                   // do{
      // Assume even/odd is unpredictable, so do it branchlessly:
      mov    edx, [esi]
      test   edx, 1               // ZF cleared for odd numbers only
      cmovnz edx, ecx             // zero out EDX for odd numbers only
      add    eax, edx             // add zero or a[i]

      add    esi, 4
      cmp    esi, edi         // while(++pointer < end_pointer)
      jb   sum_loop

  early_out:
      mov S, eax     // MSVC does actually support leaving a value in EAX and falling off the end of a non-void function (without return),
                     // but clang -fasm-blocks doesn't.  And there's no way to explicitly tell the compiler where the output(s) are in this dumb syntax, unlike GNU C inline asm
  }

  return S;
}

With a branch, the simple way is

.sum_loop:
    mov   edx, [esi]
    test  edx, 1
    jnz .odd                    // conditionally skip the add
    add   eax, edx
.odd:
    add    esi, 4
    cmp    esi, edi             // pointer < end pointer
    jb  .sum_loop

Your original branch structure (duplicating the tail of the loop for the odd/even branches) is a useful technique for some cases, but not here where they're identical again after the add to the total.

In asm, write loops with the conditional branch at the bottom, do{}while() style. Check for conditions that imply zero iterations before entering the loop. You don't want a not-taken cmp/jcc and a taken unconditional branch.

lodsd is just as efficient as mov eax, [esi] / add esi, 4 on Intel starting with Haswell. It's 3 uops on earlier Intel, and on AMD, so it saves code size at the expense of speed. In this case that would mean we couldn't have the sum in EAX, although that isn't important because the safe way to get data out of an asm block is to mov-store it, not leave it in the return-value register. MSVC does seem to actually support the semantics of EAX being a return-value register even after inlining a function containing an asm{} block, but I don't know if that's documented. And clang -fasm-blocks certainly doesn't.

test dl, 1 would be slightly more efficient in the current code, fewer instruction bytes (3 instead of 6): there is no test r/m32, sign_extend_imm8, only with a 32-bit immediate. So if you want to test a bit in the low byte, use the low-8-bit partial register. For simplicity I tested the same register we loaded into.

Loading into EAX would actually gain efficiency, allowing test al,1 as a 2-byte instruction instead of any other byte register being 3 bytes. Smaller code-size is typically better, all else equal, unless it happens to cause an unlucky alignment for some later code, e.g. the JCC erratum on Skylake if a macro-fused cmp/jcc touches the end of a 32-byte block.

See the tag wiki for guides and stuff, also https://stackoverflow.com/tags/inline-assembly/info

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847