5

For example if I have a variable named test declared like:

test db 0x01      ;suppose the address is 0x00000052

If I do something like:

mov rax, test     ;rax = 0x00000052
mov rax, [test]   ;rax = 0x01

But, when I try to save in it, if we're following the same pattern:

mov test, 0x01    ;address 0x00000052 = 0x01
mov [test], 0x01  ;address 0x01 = 0x01

But it actually is:

mov [test], 0x01  ;address 0x00000052 = 0x01

So, why the square brackets behave differently depending if they are the first or second operands?

  • `mov test, 0x01` would mean `0x00000052 = 0x01`, i.e. number = other_number, which doesn't make sense. Your comment *";address 0x00000052 = 0x01"* somehow assumes the value 0x52 is memory address, but there's no reason to assume that. BTW `test` is not variable, it is symbolic label for certain memory address `0x52`, you can create label just by `test:`, you don't need to follow it with `db` directive to reserve any space (although you should, if you want to overwrite the bytes following that label). My quarrel is about how you think about it, there are no variables in asm. – Ped7g Mar 28 '18 at 12:41
  • and `mov [test], 0x01 ;address 0x01 = 0x01` has weird comment too... it's `mov [0x52],1` = store value `1` into memory at address `0x52`, and it's ambiguous, as the assembler can't tell from that source if you want to store 8/16/32/64 bit value `1`, NASM should either fail or at least emit warning on that line. In ambiguous case you should specify size explicitly, like `mov byte [test],1` -> to write only single byte into memory. (BTW "why" - because Intel syntax marks memory access with square brackets and NASM creators decided to follow that rigorously). – Ped7g Mar 28 '18 at 12:43
  • 2
    Because [NASM Requires Square Brackets For Memory References](http://www.nasm.us/xdoc/2.11.08/html/nasmdoc2.html#section-2.2.2) – Jester Mar 28 '18 at 12:51
  • `mov rax, test ;rax = 0x00000052` shows you're probably looking at disassembly of a `.o` you haven't linked. It's 0x52 bytes from the start of the file or something. `mov rax, test` is a [`mov r64, sign_extended_imm32`](http://felixcloutier.com/x86/MOV.html) of the address. – Peter Cordes Mar 28 '18 at 13:16
  • Thanks for the insightful answers! About the "variables in assembly", I've already programmed plenty of assembly on HCS12, but it's a microcontroller with only A and B registers, and referencing memory is only "$", that's why I was so confused why mov rax, [test] is different from mov [test], rax. – Pedro Palhari Mar 28 '18 at 13:31
  • 1
    In C int *a; x=a vs y=*a the latter is with brackets in this asm syntax and the former without. – old_timer Mar 28 '18 at 14:18
  • 1
    In x86 asm, the destination is always the first operand. `mov rax, [test]` is a load, the other order is a store (different opcode but same mnemonic). On load/store architectures with separate mnemonics like `lw` and `sw`, it's typical for them not to follow the pattern of which operand is the destination for ALU instructions. e.g. MIPS `lw $t0, ($a0)` and `sw $t0, ($a0)`, not `sw ($a0), $t0`. But on x86, almost all instructions can have a memory source or a memory destination, so they always respect the operand ordering. – Peter Cordes Mar 28 '18 at 16:56
  • @PedroPalhari I see... x86 is lot more versatile, so you can write both `mov eax,0x52` and `mov eax,[0x52]`, first one will load the value 0x52 itself into `eax`, the second will use `0x52` as memory address, and load 32 bit value (size is deducted from target register = eax = 32 bits) from memory. When you flip the arguments, the source vs destination is flipped, which makes sense with `mov [0x52],eax` (storing 32 bit value of `eax` into memory), but not `mov 0x52,eax` (immediate constant is not something desirable for writing into). NASM is consistent in style "[] = memory access". – Ped7g Mar 28 '18 at 19:54

2 Answers2

5

In most assemblers, using square brackets dereferences a memory location. You are treating the value as a memory address.

For example, let's take this for an example.

mov ax, [0x1000]

This will get the value at 0x1000 and put it into AX. If you remove the square brackets, you only move 0x1000.

If you move a value to a number, you are putting it into the value (memory location).

If you are a C developer, here's an example problem.

Don't let this example annoy you if you've been bullied into learning C by others, calling you a 'troll'.

You can ignore this if you want but you might have known about scanf() if you know C.

int a = 10;
scanf("%d", a);

Now, this is a very common mistake because we are not getting the memory address of the variable. Instead, we are using its value as the address. The scanf() function requires you to give the the address.

If we did this,

scanf("%d", &a);

we would have the address of the variable a.

Community
  • 1
  • 1
Steve Woods
  • 227
  • 1
  • 9
  • 3
    The point is that MASM is the weird / inconsistent one, by making `mov eax, symbol` a load even though it *doesn't* have brackets. To figure out if it's a load or a mov-immediate, you have to go look at whether it's defined as an `equ` or `=` constant or as a label. NASM forces you to use syntax that matches how you define names, so you can always tell what kind of instruction it is. – Peter Cordes Mar 28 '18 at 16:59
2

Steve Woods' post gave me the impression he thinks & is a dereference operator. & is C's reference operator. * is C's dereference operator. The OP has a valid concern. [] can seem to function as both depending on the context. It is neither a dereference or reference operator. It is the "This is a memory address!!!" operator.

https://nasm.us/doc/nasmdoc3.html#section-3.3

An effective address is any operand to an instruction which references memory. Effective addresses, in NASM, have a very simple syntax: they consist of an expression evaluating to the desired address, enclosed in square brackets.

; assume wordvar was used as a label, and the linker gave it address 6291668
; or mostly equivalently, you used   wordvar equ 6291668

mov eax,wordvar         ; eax = 6291668. Move value 6291668 to eax.
mov eax,[wordvar]       ; eax =  12. Move contents of address 6291668 to eax.

mov eax,13
; mov wordvar,eax       ; Move eax to value 6291668. syntax error.
mov [wordvar],eax       ; mem(6291668) = 13. Move eax to address 6291668.

When an operand is a memory address, it has to be enclosed in square brackets to tell nasm that is the case. It's not dereferencing it, it's just letting nasm know what's up. If it was equivalent to the dereference operator,

mov [wordvar], eax

would set memory location 12 to 13.

It's not the dereference operator. It's the "this is a memory address" operator. This appears to be both dereferencing and referencing in different cases because x86 and x86_64 instructions behave differently based on whether its operands are memory locations or values. I am teaching myself assembly and I had to explain this to figure it out myself.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Justin
  • 61
  • 5
  • Your first paragraph isn't quite right. C assumes that all variables have an address, and using their bare names gives you the contents of that memory, not the address. So if you want the address in C, you need to add a `&`, vs. in NASM you leave out the `[]`. **In NASM, bare names are like numeric constants, or pointer constants**, like C `static const *a = some_static_address;` where you do need `*a` to reference memory. Or for EQU constants, `static const *a = 12345;` – Peter Cordes Feb 09 '19 at 18:40
  • So anyway, `[]` always dereferences the symbol value to access memory at that. (**Technically symbol values *are* their address**, not the pointed to memory. Putting a label somewhere is *very* similar to `foo equ 0x401000`, as far as what happens when you use that token later inside or outside of `[]`). And since we know that x86 doesn't have memory-indirect addressing, `[foo]` couldn't have been syntax for loading the address from memory and then dereferencing it. Unlike C, there's no compiler that can turn expressions into multiple instructions if they're not encodable as one. – Peter Cordes Feb 09 '19 at 18:47
  • 1
    And BTW, in x86 terminology, a "word" is 16 bits. EAX is a dword register, so you might want to adjust your variable name. – Peter Cordes Feb 09 '19 at 18:51
  • Update to my first comment: `extern char foo[];` is a better C analogy for a symbol defined by a label, and what you'd actually use if you want to declare a C var for something where you don't want to access bytes there, just use the address, like `end_data` (end of the .data section). There is no pointer object to get optimize away, just the name attached to an address. – Peter Cordes Aug 13 '23 at 16:06