Distinguishing 'mov byte ptr' and 'mov word ptr' in Assembly: Clarification Needed?

Question

I read some earlier posts about
'What ptr does?'
'What [ ] does?'
but I found nothing helpful to understand below question?

Title : Program failed to comprehend
.model small
.stack 100h
.data
     Msg db 10,13, 'Zahid. $'   
.code
.startup
     ; Initialising data segment
      mov ax, @data
      mov dx, ax
    ;Before operation displaying message
     mov dx, offset msg
     mov ah,09h
     int 21h

      mov msg , 'A'          ; Writing on memory specified by msg thats OK -> A
      mov msg+1 , 'R'     ; Confusion as writing on memory specified by msg then add 1(not 8bits next address write) -> A
      mov [msg]+2, 'I'     ; confusion: Writing on memory specified by msg then add 2 value to that address write-> I
      mov byte ptr [msg+3] , 'F'      ; Ok with me, writing on byte memory specified by msg+3 -> F 
      mov word ptr [msg + 4], '.'      ; Confused again as writing on word memory specified by msg+4, '.' Will it write ascii on the two bytes-> .
      mov msg[5] , '$'   ; Not confused on this.

   ;Print var
    mov dx, offset msg
    mov ah,09h
    int 21h
 
    ;Exit Code.
    mov ah,04ch
    xor al,al
    int 21h
     
Ends

Output : Zahid. ARIF.

Please explain the operation as I believe it should not print 'ARIF'??

What are you expecting it to print? It's quite clear just from looking at the code and comments that it should be printing something like `'Arif'` (*Although I would expect this to be uppercase*). Even contains the characters `ARIF.` in the code. — Spencer Wieczorek, Oct 30 '17 at 14:11
The output should be `\n\rZahid. ARIF.` You either didn't post exact output, or code. EDIT: you actually have the DOS newline 13,10 bytes in wrong order, so the output will start `\n\r`, not `\r\n`. What is your question exactly? — Ped7g, Oct 30 '17 at 14:12
Why mov [msg] +2 , 'I' didnt produce error. Because [msg] will return address of variable then increment it by 2 and initialise that address by some value should not produce 'i' on execution — Zahid Khan, Oct 30 '17 at 14:28
the emu8086 and MASM (and TASM in MASM emulation mode) have quite relaxed syntax, all these are equivalent in MASM: `mov msg+2,'I'`, `mov [msg+2],'I'`, `mov [msg]+2,'I'`, `mov byte ptr [msg+2],'I'`, `mov [2+msg],'I'`, `mov msg[2],'I'` (not sure about this one and I don't have emu8086 to verify) ... basically as long as on one side is memory operand, it's memory operand. Check Intel instruction reference guide to see what are the valid memory operands, especially in 16 bit real mode they are very limited. (the `byte ptr` is not needed, because `msg` was defined by `db`, MASM recognized that) — Ped7g, Oct 30 '17 at 14:34
But the "correct" Intel syntax is `mov [msg+2],'I'` and you have to specify the datum size, in NASM for example `mov byte [msg+2], 'I'` or `mov [msg+2], byte 'I'` .. as you can see, each assembler may have subtle differences/dialects. Usually if you are not experienced enough to recognize all those quirks, you may want to produce listing file (I think it's new feature of emu8086 from last version, and normal feature of stand-alone assemblers), where you can check after assembly what instruction was produced + it's opcode. — Ped7g, Oct 30 '17 at 14:38
Related: https://stackoverflow.com/questions/25129743/confusing-brackets-in-masm32. "square brackets `[]` mean pretty much nothing to MASM when you're just using symbols. They only mean something when you use them with registers." (MASM / TASM / emu8086 are all the same on this, AFAIK). — Peter Cordes, Oct 30 '17 at 14:40
It's not clear what you think each instruction should do, i.e. how the syntax in each line should translate to machine code. Please [edit] your question to say what you think the syntax means. What kind of answer are you looking for? Something like Ross's answer on the question I linked, explaining that `[]` is optional with symbol names? This is nearly a duplicate of that. — Peter Cordes, Oct 30 '17 at 14:44

Ped7g · Accepted Answer · 2017-10-30T20:00:56.157

In assembly the syntax depends on particular assembler. Emu8086 is mostly following MASM dialect, which is quite relaxed in rules and allows for several different options (with same output).

If you are used to some high level programming language, this may feel confusing, why the syntax is not set in stone and how to live with this mess in asm.

But for asm programmer this is rarely an issue, because in assembly you don't build some runtime expression with operators and different values, instruction from source is usually 1:1 mapped to one of CPU instructions, with the exact arguments and options of the particular instruction which exists in CPU.

The MOV instruction on x86 is a bit mess itself, as it is single mnemonics "MOV" used for many different instruction opcodes, but in your example only two instructions are used: MOV r/m8,imm8 with opcode C6 for storing byte values, and MOV r/m16,imm16 with opcode C7 to store word value. And in all cases that r/m part is memory reference by absolute offset, which is calculated during compile time.

So if msg is symbol for memory address 0x1000, then those lines in your question compile as:

; machine code  | disassembled instruction from machine code

C606001041        mov byte [0x1000],0x41

Store byte value 0x41 ('A') into memory at address ds:0x1000. The C6 06 is MOV [offset16],imm8 instruction opcode, the 00 10 bytes are 0x1000 offset16 itself (little endian) and finally the 41 is the imm8 value 0x41. Segment ds will be used to calculate full physical memory address by default, because there's no segment override prefix ahead of that instruction.

C606011052        mov byte [0x1001],0x52
C606021049        mov byte [0x1002],0x49
C606031046        mov byte [0x1003],0x46
C70604102E00      mov word [0x1004],0x2e
C606051024        mov byte [0x1005],0x24

Remaining lines are the same story, writing byte values at specific memory offsets, going byte by byte in memory, overwriting every one of them.

With the subtle difference of mov word ptr [msg + 4], '.', which does target memory address ds:0x1004 similarly like other lines, but the value stored is imm16, i.e. word value, equal to 0x002E ('.'), so the different opcode C7 is used, and the immediate value needs two bytes 2E 00. This one will overwrite memory at address ds:0x1004 with byte 2E, and ds:0x1005 with byte 00.

So if the memory at address msg (ds:0x1000 in my examples) was at the beginning:

0x1000: 0A 0D 5A 61 68 69 64 2E 20 24  ; "\n\rZahid. $"

It will change to this after each MOV executed:

0x1000: 41 0D 5A 61 68 69 64 2E 20 24  ; "A\rZahid. $"
0x1000: 41 52 5A 61 68 69 64 2E 20 24  ; "ARZahid. $"
0x1000: 41 52 49 61 68 69 64 2E 20 24  ; "ARIahid. $"
0x1000: 41 52 49 46 68 69 64 2E 20 24  ; "ARIFhid. $"
0x1000: 41 52 49 46 2E 00 64 2E 20 24  ; "ARIF.\0d. $"

That word did overwrite two bytes, both 'h' (with dot) and 'i' (with zero).

0x1000: 41 52 49 46 2E 24 64 2E 20 24  ; "ARIF.$d. $"

And that zero is overwritten one more time to dollar sign (string terminator for the DOS int 21h service ah=9).

Generally the relaxed syntax is not a problem, because you can't build your own instruction, the assembler will guess which one of the existing ones fits, and compile whatever expression you have into it. There's no instruction on x86 like mov [address1] and [address2], value storing same value at two different memory locations, or mov [address]+2 which would add two to the memory value at address (that's possible to do with add [address], 2 which is one off the add r/m,imm variants, depending on the data size).

So mov msg+1,... can be only memory address msg + 1, there's no other meaningful possibility in x86 instruction set. And the data size byte is deducted from the db directive used after label msg:, this is speciality of MASM and emu8086 assemblers, most of the other assemblers don't link any defined label (symbol) with directive used after it, i.e. no "types" of symbols in common assemblers. For those the mov msg+1,'R' may end with syntax error, but not because the left side is problematic, but they will not know how big the 'R' value should be (how many bytes).

My personal favourite NASM would report another error on it, as it requires the brackets around memory access, so in NASM only mov [msg+2],... would be valid (with size modifier like "byte ptr" in MASM allowed, but without "ptr": mov byte [msg+2],.... But in MASM/emu8086 all the variants you used are valid syntax with same meaning, producing memory reference by 16b offset.

The assembler will also not produce two instructions instead of single (exception may be special "pseudo-instructions" in some assemblers, which are compiled to several native instructions, but that is not common in x86 assembly).

Once you know the target CPU instruction set, what instructions do exist, you will be able to guess from the vague syntax easily, which target instruction will be produced.

Or you can easily check in debugger disassembly window, as the disassembler will use only single way of syntax for particular instruction, not aware of the source formatting ambiguities.

 mov word ptr [msg + 4], '.'
   ; Confused again as writing on word memory specified by msg+4,
     '.' Will it write ascii on the two bytes-> .

It will write on two bytes, that's what WORD PTR in MASM specifies. But the value is only '.' = 0x2E. But 0x2E is perfectly valid even as 16 bit value, simply extended with zeroes to 0x002E, and that's the value used by the assembler for this line.

In future, if you are not sure, how particular thing assembles, and what it will do to the CPU/memory state, just use the emu8086 debugger. If you would in this case, you would see in the disassembly window that all those variants of msg+x did compile to the memory addresses going byte by byte over original msg memory. Also if you would open some memory view (I hope emu8086 has one, I don't use it) at msg address, you could watch each write to memory, how it does change the original values, and how that WORD PTR works, as you were not sure. Watching in debugger is usually lot more easier than reading these long answers on stack overflow...

About what PTR does: In assembly, what does `PTR` stand for? ... doesn't explain it well, as it's hard to explain it well, the whole "BYTE PTR" is the term used by MASM, it's not parsing it as BYTE and then doing something PTR to the BYTE, but it will parse it as "BYTE PTR" and be like "okay, he want to address byte". It's like single keyword, but with space.

Distinguishing 'mov byte ptr' and 'mov word ptr' in Assembly: Clarification Needed?

1 Answers1