Assembly 8086 (IA32) trouble with adding elements of two arrays

Question

So I'm learning some basic assembly IA-32 and I'm having some trouble understading how actually registers store stuff. The following program is supposed to add the elements of two given arrays by index position (a1[i]+a2[i])and store the result in the second register. The arrays have the same length. I am using gdb so I know the looping part works.

EXIT = 1
WRITE = 4
LINUX_SYSCALL =0x80

.data
array1: .int -1, 5, 1, 1, 4  # um vetor de inteiros
array2: .int 1, -3, 1, -5, 4 # vetor que fica com a soma dos dois

.text

.global _start
_start: 
             movl   $array1,            %eax
             movl   $array2,            %ebx

    ifThen:

             jz     fim
             jmp    soma    

    soma:
             add    %eax,               %ebx
             jmp    next_pos

    next_pos:
             inc    %ecx
             add        $4,                 %eax
             add        $4,                 %ebx
             jmp    ifThen

    fim:
             movl   $EXIT,              %eax
             int    $LINUX_SYSCALL

My original idea was too see through gbd if the values at the ebx register were being added correctly (hence no write syscalls). Instead, I keep seeing big numbers in the registries, which I assume to be addresses, and not the results of the sum of the elements in the arrays. However if I remove the dollar sign in the movl instruction (movl array1, %eax), I get the numbers I expect but can't go to the next position of the arrays since the add instruction actually adds 4 to the value in the register instead of moving the register pointer to the next 4 bytes.

Any help appreciated, thanks in advance!

You don't have any memory references in your code; you're only adding register values. You can't use `add` with two memory operands, so you'd have to `mov (%eax), %edx` / `add (%ebx), %edx` or something. Also, you don't set flags before `jz` on the first iteration when you fall into the loop. And you don't compare the loop counter against anything, or initialize it to `-5` and count upwards or anything. I'd recommend looking at compiler output for a simple C loop that does the same thing. See [How to remove "noise" from GCC/clang assembly output?](//stackoverflow.com/q/38552116) — Peter Cordes, May 03 '18 at 16:09

score 1 · Accepted Answer · answered May 03 '18 at 16:25

You did observe the behaviour well, and you are (mostly?) correct about them.

movl $array1, %eax vs movl array1, %eax: yes, first one will load eax with the memory address, second one will load eax with 32 bit value from memory (from that address).

I'm having some trouble understading how actually registers store stuff.

The general purpose registers like eax are 32 bit registers (on modern x86 CPU supporting 64 bit the eax is the low-32 bit part of rax, which is 64 bit register). That means, that the register contains 32 bit values (either 0 or 1). Nothing else. The debuggers, unless you switch it to different interpretation, will usually display values as 32 bit unsigned hexadecimal integer, because from output like hexadecimal 1234ABCD you can read the particular bit pattern in head (each hexadecimal digit is exactly 4 bits, i.e. B = 11 = 8+2+1 = 1011 binary), but that doesn't mean the register contains hexadecimal value, the register is only 32 bits, and you can interpret them any way you (or the code) wish.

To access array elements with index i you can pick from different techniques, in your task of summing arrays I would probably stay with your original code using memory addresses directly onto elements, but then you need one more register to load the actual value, i.e.:

    # loop initialization
    movl $array1, %eax   # eax = array1 pointer
    movl $array2, %ebx   # ebx = array2 pointer
    # TODO: set up also some counter or end address
loop_body:
    # array1[i] += array2[i];
    movl (%ebx), %edx    # load value array2[i] from memory into edx
    addl %edx, (%eax)    # add edx to the array1[i] (value in memory at address eax)
    # advance array1 and array2 pointers (like ++i;)
    addl $4, %eax
    addl $4, %ebx
    # TODO: do some loop termination condition and loop

This allows for simple body loop code, and to provide the same summing code with different arrays to sum.

Other options

You can avoid the need of register with memory address by encoding it directly into the memory accessing instructions, like:

    # loop initialization
    xorl %ecx, %ecx      # ecx = 0 (index + counter)
loop_body:
    # array1[i] += array2[i];
    movl array2(,%ecx,4), %eax  # load value array2[i] from memory into eax
    addl %eax, array1(,%ecx,4)  # add eax to the array1[i]
    incl %ecx                   # ++i
    # TODO: do some loop termination condition and loop

But this code can't be redirected to different arrays.

Or you can use array addresses in registers, but avoid their modification, by using the index register addressing:

    # loop initialization
    movl $array1, %eax   # eax = array1 pointer
    movl $array2, %ebx   # ebx = array2 pointer
    xorl %ecx, %ecx      # ecx = 0 (index + counter)
loop_body:
    # array1[i] += array2[i];
    movl (%ebx,%ecx,4), %edx  # load value array2[i] from memory into edx
    addl %edx, (%eax,%ecx,4)  # add edx to the array1[i]
    incl %ecx                 # ++i
    # TODO: do some loop termination condition and loop

This may make sense, if you did plan to use index value anyway, so you need plain i, and you plan to use array addresses later too, so not modifying them is handy, etc...

There are other ways how to access values in memory, but the above ones are most straightforward for somebody learning x86 assembly.

Keep in mind in assembly there are no variables or arrays, etc.. the computer memory is like one huge array without name, having indices from 0 to N-1 (N = size of physical memory), and on each index there's single byte available (8 bits of information).

Registers are like 8/16/32/64 bit of information available directly on the CPU chip, so the CPU doesn't need to know address (the name "eax" is like address), and doesn't need to contact the memory chip for value (so registers are faster than memory).

To contact memory in AT&T syntax you have to write something in the form of: displacement(base_reg, index_reg, scale), see this question with details: A couple of questions about [base + index*scale + disp]

Great explanation! Following your logic ,after correcting the sum and doing it I moved the first value of %ebx to edx to see if it was correct, and it is. Thank you very much! — NGSBNC, May 03 '18 at 17:43

Assembly 8086 (IA32) trouble with adding elements of two arrays

1 Answers1