5

I'm currently doing assembly programming (16-bit) on DOSBox using MASM.

var1 dd 'abcd'

For the above code MASM is generating the error:

A2010: syntax error

What is wrong with the syntax? I am simply storing 4 characters in a doubleword.

I am doing 16-bit assembly, so is that a problem? Can I only use db and dw because the other variables are greater than 16 bit?

Fifoernik
  • 9,779
  • 1
  • 21
  • 27
Hassaan Raza
  • 187
  • 3
  • 12
  • You are storing 4 bytes, so try `var1 db 'abcd'`. In memory, it's the same. – zx485 May 23 '19 at 19:04
  • i know this but i should know how to use dd or what is the purpose of having variables other than db? – Hassaan Raza May 23 '19 at 19:09
  • 3
    Early versions of MASM and TASM don't support the syntax of having a string of characters being placed in a type other than `db`. Later versions of some assemblers (and other compatible products like wasm and jwasm) do support what you are doing. This is important. Even for assemblers that support it - types greater than a single byte (dw, dd, dq etc) that have a character array loaded into them will appear reverse order when viewed in memory (because x86 is little endian processor). So if this did work `dd var1 'abcd'` the bytes would appear as `d` `c` `b` `a` when viewed in memory. – Michael Petch May 23 '19 at 19:33

1 Answers1

4

var1 db 'abcd' (not dd) puts the 4 bytes you want into memory in source order.

what is the purpose of having variables other than db?

Convenience in writing the initializer, dd 1234h is more convenient than db 34h, 12h, 0, 0 but assembles identical data into the output file. Also, the way MASM treats them as implying an operand-size when you use the symbol.

Later versions of MASM do accept dd 'abcd', but they endian flip it. (Instead of assembling bytes into memory in source order like NASM does.) See @RossRidge's answer for MASM details.

NASM will accept mov eax, 'abcd' or dd 'abcd' just fine: multi-character literals are just another form of integer literal, with the first byte first in memory (the least significant), because x86 is little endian. i.e. in NASM, multi-character integer literals have a memory order that matches their source order.

But MASM reverses them when used with dd or dw, so the first character becomes the most significant byte of an integer, and memory order is the reverse of source order. It may be a good idea to avoid it even in MASM versions that support the syntax, and use hex ASCII codes plus a comment.


In MASM, var1 dd vs. db also sets a default operand-size for accessing the data, if you declare it as a variable instead of a label.

Using var1 db ... means you'll have to use an explicit dword ptr any time you want to access all 4 bytes with mov eax, [var1]. Without dword ptr [var1], MASM will complain about operand-size mismatch.

But if you declare it as just a plain label, not tied to any db or dd directives that assemble bytes into memory, I think you can freely use it with any size.

(Update: apparently a label with a : is an error in MASM outside of code sections. I'm not sure if there is a way to declare just a data label that isn't a MASM "variable". See discussion in comments.)

;; I'm not sure this is correct, I'm making this up from memory
;; and I've never actually used MASM.  I know the syntax from SO answers.
.data
    label1:         ; "Just" a label, no data
      db 'abcd'       

    ; little-endian 'abcd'
    var2  dd 64636261h        ; no : so the symbol becomes a variable with a size from the dd

.code
func:
    mov  eax, [label1]                ; legal I think
    mov  al, [label1]                 ; also legal
    mov  eax, dword ptr [label1]      ; always works
    movzx  eax,  byte ptr [label1+2]  ; zero extend the 'c' into EAX

    inc  [label1]                  ; ERROR: ambiguous operand-size

    mov  eax, [var1]               ; fine, both operands are dwords
    mov  al, [var1]                ; ERROR: operand-size mismatch
    mov  al, byte ptr [var1]       ; load the low byte of the dword

    inc  [var1]                   ; legal: the "variable" implies dword operand size
    inc  dword ptr [var1]         ; same as above
    and  byte ptr [var1], ~20h    ; upper-case just the first character, 'abcd' into 'Abcd'

Note that mov eax, var1 is equivalent to mov eax, [var1] in MASM syntax, but I prefer making the memory reference explicit by using [].

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • You may wish to see my comment. Whether this is supported or not in MASM is dependent on how old the product is. Old versions of MASM didn't support it, but newer versions did (not sure where the cut off was on that change) – Michael Petch May 23 '19 at 19:34
  • @MichaelPetch: thanks, yeah was already updating :) If I have this right, `'abcd'` in MASM = `'dcba'` in NASM, because MASM versions that support multi-character integer literals reverse then in memory vs. their source order? – Peter Cordes May 23 '19 at 19:39
  • 1
    I can confirm on the latest version of MASM that `dd 'abcd'` outputs these bytes (and this was the disassembly of a 32-bit program): `00DB85D4 64 63 62 61 arpl word ptr fs:[edx+61h],sp` . So MASM does in fact output them in reverse. So does JWASM and WASM (WASM=watcom assembler) – Michael Petch May 23 '19 at 19:43
  • I should point out it isn't correct to use a colon after a label inside a data section. Assemblers will choke on that. The colon is more for scoping as it pertains to code sections. In a non-code section `label1: db 'abcd'` should yield an error in MASM/TASM. In a code section if you wanted a label with a colon - the label actually has to appear on a line by itself. It would have to be `label1:` and on another line `dd 'abcd'`. `label1: dd 'abcd'` should produce an error. – Michael Petch May 23 '19 at 20:09
  • @MichaelPetch: thanks. Did this edit help? You can always put a label on a line by itself, right? (Ah, your edit to your comment says yes.) – Peter Cordes May 23 '19 at 20:12
  • 2
    This may surprise you but in a data section a label with a colon (`label1:`) on a line by itself is an error. Effectively the only place a label should appear with a colon is in a code section. The rule of thumb is that it is basically wrong to use a colon after a label in non-code sections. This is a rule of thumb for the older compilers (TASM/MASM). That behaviour changed in later versions lol. But if someone is using older TASM/MASM I wouldn't rely on it actually assembling without error. If you want it to work on various versions of assemblers it should be `label1 db 'abcd'` (without `:`) – Michael Petch May 23 '19 at 20:22
  • The scoping rule differences between MASM and TASM didn't help despite the claims TASM was compatible with MASM. Certain things had to be done a specific way if you wanted the code to be compatible between both. – Michael Petch May 23 '19 at 20:25
  • @MichaelPetch: Thanks. I anticipated this answer might need corrections, but I didn't expect a showstopper. :/ Is there a way to declare a data label that isn't a "variable"? Would `label1` on a line by itself work (at all?), and would that still tie it to the first `db` or `dd` on a later line? – Peter Cordes May 23 '19 at 20:43
  • 2
    `label1` (without a colon) on a line by itself isn't correct if you are supporting varying versions of assemblers. Effectively it comes down to this. The most compatible way of defining data in a non-code segment is to never use a `:` and the label (without a colon) always has to be on a line where you declare data. – Michael Petch May 23 '19 at 20:48