1

I've begun to study assembly and I have some difficult with a sample program.

I wrote a macro that would find the minimum in an array:

%macro min 3    
    mov ecx, dword[%2]
    mov r12, 0
    lea rbx, [%1]
    movsx eax, word[rbx+r12*4] ; inizializza il minimo con il primo elemento dell'array

%%minLoop:
    cmp eax, [rbx+r12*4]
    jl %%notNewMin
    movsx eax, word[rbx+r12*4]

    %%notNewMin:
        inc r12

    loop %%minLoop
    mov [%3], eax

%endmacro

section .data
EXIT_SUCCESS equ 0
SYS_exit     equ 60

list1 dd 4, 5, 2, -3, 1
len1 dd 5
min1 dd 0

section .text
global _start

_start:
    min list1, len1, min1
last:
    mov rax, SYS_exit ; exit
    mov rdi, EXIT_SUCCESS ; success
    syscall

This program successfully compile, but when I debug it (with DDD), in the eax register I have the hex value 0xFFFFFFFD and the decimal value of 4294967293.

But, if I use a calculator 0xFFFFFFFD is really -3 which is the correct value.

In your opinion, is my program correct?

Thanks in advance for your answers.

DarkSkull
  • 1,041
  • 3
  • 13
  • 23

2 Answers2

4

It's not correct, though testing it with small values would hide the bug.

There is an inconsistency in what type of the elements of the array are treated as. They were defined with dd, and the address calculation is consistent with that (using 4*index). cmp eax, [rbx+r12*4] is also consistent with that. But movsx eax, word[rbx+r12*4] is not, now suddenly the upper 16 bits of the element are not used.

This can be fixed very easily by writing mov eax, [rbx+r12*4] instead.

By the way you should usually not use loop, it's quite slow on most modern processors.

harold
  • 61,398
  • 6
  • 86
  • 164
  • ok, but I used `movsx` because with this instruction I can carry the sign. Why is right to use only `mov`? P.S. I use the `word` casting otherwise the `movsx` doesn't work. – DarkSkull Aug 09 '18 at 17:30
  • 1
    @DarkSkull `movsx` is for sign *extension*, used when converting a narrow thing to a wider thing while keeping the signed value the same. A normal `mov` of the right size just copies the entire thing, there are no separate signed and unsigned copies, a copy is just a copy. – harold Aug 09 '18 at 17:37
  • Thank you for the explanation. – DarkSkull Aug 09 '18 at 17:43
3

0xFFFFFFFD is 32 bit value 1111_1111_1111_1111_1111_1111_1111_1101, which is probably the closest metaphor for what the CPU has physically inside (32 cells with different electricity current level or magnetic poles encoding logical value 0 or 1).

Whether you interpret that as -3 or 4294967293 or something completely different (let's say 32 independent true/false values) is up to the code, which is using the value.

The negative integers are usually using the two's complement encoding, which you are observing with your -3 value.

The debugger doesn't know if you are interpreting the value as signed or unsigned (unless you specify it by formatting parameters), so it will pick one format and display like that, in your case as unsigned 32 bit value, which means you see 4294967293 instead of -3, but bitwise those two are identical, and also for arithmetic instructions like add/sub/cmp/test/... that value is identical, only the interpretation of results (and flags) by the following code will decide, if the value was "signed" or "unsigned".

The sign itself is not part of the encoded information, or sometimes the top bit is deemed as "sign" bit, because all negative values have the top bit set, but that's the reason why signed 8 bit value can store only values -128..+127, while unsigned 8 bit value can store values 0..+255 (i.e. both interpretations cover exactly 256 different values, because 8 bits can produce 256 different combinations of 0/1 patterns, but the signed interpretation "starts" at "0x80 = -128", while the unsigned interpretation "starts" at "0x00 = 0" and 0x80 is already interpreted as +128. But both interpretations are working with the only 8 bit values, there's no other additional information, like some kind of type, etc..

For example

cmp    eax, ebx   ; check if eax is bigger than ebx
; now if the values were meant as unsigned, then use "ja" branch
ja     eax_is_bigger_as_unsigned
; but if you meant the values as signed, then you should use "jg" (testing different flags)
jg     eax_is_bigger_as_signed

So the cmp itself doesn't care how you interpret that bit pattern, it will set enough flags in the EFLAGS register to make the later conditional branching possible for both cases.

Ped7g
  • 16,236
  • 3
  • 26
  • 63