Getting confused about usage of labels in assembly for X86_64 linux: Why should we write mov [digit], al, but not mov digit, al?

Question

Here's my code:

section .data
    digit db 0,10

section .text
    global  _start
_start:

    call _printRAXDigit

    mov rax, 60
    mov rdx, 0
    syscall


_printRAXDigit:
    add rax, 48
    mov [digit], al

    mov rax, 1
    mov rdi, 1
    mov rsi, digit
    mov rdx, 2
    syscall
    ret

I have a question about the difference between [digit] and digit.

I have learned that labels (like digit in the code), represent the memory address of the data, and the operator "[]" acts like something to dereference the pointer, so it will load the value that the label points at to the destination.

For instance, mov rax, [digit] will throw 0 to the rax register because digit points at the first element of the data (in this case, the integer 0).

However, in my code, it works when I write mov [digit], al, which means "load the value stored in al to the memory address digit", but I have no idea why we should use "[]" in this case. The first argument of mov must be a destination (like a register or a memory address), so I think it should be mov digit, al rather than mov [digit], al. It doesn't make sense to me why we use a value to get the value from another place rather than use a memory address to get the value.

So that's all of my question. Please give me any response about where my thinking is wrong or any correction about my concept of labels.

`digit` *is* an address. But it's *also* a number. You need to tell the assembler how to interpret it. — Ignacio Vazquez-Abrams, Oct 21 '17 at 08:39
What do you mean by "digit is an address. But it's also a number.", can you give any specific example so that I can understand more, thanks — BooAA, Oct 21 '17 at 08:48
0 is an address. And it's a number. 3 is an address. And it's a number. 27 is an address. And it's a number. 252 is an address. And it's a number. — Ignacio Vazquez-Abrams, Oct 21 '17 at 08:48
OK, so what do you mean by "tell the assembler how to interpret it" , What happens when we use a bracket on "digit" in comparison with not using a bracket? — BooAA, Oct 21 '17 at 08:56
@李亮節 Addresses are numbers. If you write `mov digit,al` the assembler generates an instruction that moves the value of the symbol `digit` (i.e. its address) to `al`. Since `al` is an 8 bit register, the address is probably too large and you are going to get errors at link time. — fuz, Oct 21 '17 at 08:56
@Ignacio Vazquez-Abrams Oh I got it . If we use a bracket it will be treated like a address, so it can be act like a destination(just like other general register), if we don't use bracket , "digit" will just be treated like a "value"(I mean that a series of memory address be treated like a value like other constant ), am I right? — BooAA, Oct 21 '17 at 09:07
consider `mov bx,ax` vs `mov [bx],ax` .. first one does copy value of `ax` into `bx`, just on the CPU chip, not contacting memory. The second one does store value in `ax` into memory chip, at address which is value of `bx`. In that style `mov some_adress,ax` makes no sense, as you can't overwrite constant, but `mov [some_adress],ax` looks as storing value of `ax` into memory. I prefer this consistent style, where memory access is always marked by brackets. (except `lea` of course, which does not access memory, but it does use the same syntax as memory operand for `mov`). — Ped7g, Oct 21 '17 at 10:43

Martin Rosenau · Answer 1 · 2017-10-22T05:45:03.593

5

In NASM syntax (there are assemblers which use different notation, e.g. MASM/TASM use a different flavor of Intel syntax, and gas uses AT&T syntax) the following x86 instructions ...

mov esi, someAddress
mov esi, [someAddress]
mov [someAddress], esi
mov someAddress, esi   ; see below

... (would) have the following meaning:

mov esi, someAddress

Write the number that represents the address where someAddress is stored to the register esi. So if someAddress is stored at address 1234 the value 1234 is written to esi.

mov esi, [someAddress]

Write the content of the memory to esi. So if someAddress is stored at address 1234 and the value stored at address 1234 is 5678 the value 5678 is written to esi.

You might also say: The value of the variable someAddress (a variable normally is nothing but the content of the memory at a certain address) is written to the esi register.

mov [someAddress], esi

Write the content of esi to the memory at address someAddress.

You might also say: Write the value of esi to the variable someAddress.

mov someAddress, esi

Would mean: Change the constant number which represents the address someAddress to esi.

So if someAddress is located at address 1234 and esi contains the value 5678 the instruction would mean:

Change the mathematical constant 1234 in a way that 1234 = 5678 after that change.

This is of course stupid because the mathematical constants 1234 and 5678 will never be equal. For this reason the x86 CPU has no such instruction.

(There are CPUs having similar instructions. On the SPARC CPUs for example instructions assigning a value to the zero register (which means: "assign a value to the constant zero") are used if you only want to have the instruction's side effects - like setting the flags - but you are not interested in the result itself.)

edited Oct 22 '17 at 05:45

answered Oct 21 '17 at 08:59

Martin Rosenau

17,897
3
19
38

1

The sane syntax you call "normal" is NASM-style. The one where `mov esi, [digit]` and `mov esi, digit` are both loads and `mov esi, offset digit` is a mov-immediate is MASM/TASM style. [MASM even ignores brackets in `mov eax, [constant]` for an `=` constant (like equ)!](https://stackoverflow.com/a/25130189/224132). Apparently some people think that's sane / normal, so you might want to clarify. (I tagged the question `[nasm]` based on the Linux tag and the text in the question describing what some things did.) – Peter Cordes Oct 21 '17 at 09:16
@PeterCordes I used the word "normal" because I normally use AT&T style assembler (`movl $someAddress, %esi`). I was not aware that there is a difference between NASM and MASM. – Martin Rosenau Oct 21 '17 at 11:39
Ah, then the term you're looking for is "Intel syntax". NASM and MASM are both flavours of Intel syntax. I updated the [intel-syntax tag wiki](https://stackoverflow.com/tags/intel-syntax/info) recently with some stuff about NASM vs. MASM syntax. Note that GAS `.intel_syntax noprefix` is MASM-like, not NASM-like. (e.g. `mov eax, symbol` is a load.) – Peter Cordes Oct 21 '17 at 11:43
I really don't like the way you're explaining the zero register. It's not "assigning to a constant", it's discarding the write by sending it to read-only storage (like `/dev/zero`). There's nothing mathematical about it, but at least you didn't put that word back in. You're making it sound weirder than it is. – Peter Cordes Oct 22 '17 at 06:34
@PeterCordes: I wanted to find an instruction on any CPU which is similar to `mov 1234, eax` (in NASM syntax; `mov %eax, $1234` in AT&T syntax). Writing to the zero register is at least similar to such an instruction: You perform a write operation although the destination is per definition constant and therefore cannot be written. – Martin Rosenau Oct 22 '17 at 08:34
No CPUs ever have a field in an instruction that's interpreted as a constant and used as a destination to throw away writes. You only find that with zero-registers where one bit-pattern discards the write, other bit patterns are for destinations that will hold it. (Hmm, might be interesting to look at VAX, which allows arbitrary addressing modes for both operands, including memory,memory. It probably still doesn't allow immediate mode for the destination, though.) Anyway, IMO there's a significant difference between `mov 1234, eax` and `mov r0, r1`. There's a reason only the latter exists – Peter Cordes Oct 22 '17 at 08:41

Getting confused about usage of labels in assembly for X86_64 linux: Why should we write mov [digit], al, but not mov digit, al?

1 Answers1