0

Using 32 bit MASM assembly with the MASM version 11 SDK, I discovered an error during compiling. The error pointed to the line where I declared a variable with a double-word (dd) size. The message said the variable was too small for the string I tried to assign to it. When I defined my variable as a byte instead (db) the program was compiled with no error. This implied that declaring a variable with the db instruction could allow more storage than declaring a double-data size. Below is the code for the declaration of a double-word variable that the error message pointed to:

.data
msg_run dd "Ran a function.", 0

I changed the data size of msg_run to a byte:

.data
msg_run db "Ran a function.", 0

When I tried to compile with the second line, the program compiled and ran with no problems. Why did the error imply that a variable declared to be byte-sized has more capacity than a variable declared to be double-word-sized? Does the trailing " ,0" have any effect?

Sources I reviewed:

https://www.cs.virginia.edu/~evans/cs216/guides/x86.html https://www.shsu.edu/~csc_tjm/fall2003/cs272/intro_to_asm.html

Joachim Rives
  • 471
  • 6
  • 19
  • A "string" is really just an array of characters terminated by a zero. Each character is a *single byte* (for narrow characters, `char` in C). With `dd` you make each element of the array a double word, i.e. each element is 32 bits, which isn't really correct. – Some programmer dude Aug 01 '19 at 10:51
  • 4
    MASM treats strings (things between the quotes) in a special way when you use `db`. `db` is a single character (byte) so MASM will take each character and store it in a byte. This type of processing doesn't occur the same way with types larger than a byte (dw and dd). In those situations MASM tries to stuff your string into into a single DWORD (32-bit value). Look what happens if you use `dd` and make your string <=4 characters in length. The error should disappear but the characters are placed in memory in reverse order. – Michael Petch Aug 01 '19 at 11:00
  • Related: [When using the MOV mnemonic to load/copy a string to a memory register in MASM, are the characters stored in reverse order?](https://stackoverflow.com/q/57427904) / [How are dw and dd different from db directives for strings?](https://stackoverflow.com/q/38860174) (NASM and MASM are very different.) – Peter Cordes Nov 09 '22 at 09:58

1 Answers1

2

Having a strict data definition syntax that requires the programmer to write each element separated by a comma would make declaring a string tedious:

myString db 'M', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g', 0

so MASM (and all other mainstream assemblers) relaxes the syntax in

myString db "My string", 0

Note that I used quotes ' for characters (i.e. numbers) and double quotes " for strings, I don't know the exact syntax used by MASM and it will possibly convert 1-char string to char.

What you saw with the dd case looks very similar to the shorthand above but it is not a syntax to declare strings, in fact, it creates numbers.

When a string like "ABCD" is used where a number is expected (like in a dd or as an immediate) MASM converts it to 0x44434241. These are the value of the characters D, C, B, A.
The reversing is done because the syntax is mostly used for instruction immediates, like in mov eax, "ABCD" or cmp eax, "ABCD".
This way, storing eax to memory will create the string "ABCD" (in the correct order) thanks to the x86 endianness.
This also works great with checking the signatures of tables since these signatures are designed to spell correctly in memory but, of course, reversed once loaded in a register.

In NASM you can even piss everybody off with things like mov eax, ("ABCD" + "EFGH") / 2, reinforcing the view of these strings as numbers. This should also apply to MASM.

I don't remember a case where I've used myVar dd "ABCD" but it may be useful when a structure has a fixed string that is spelled reversed in memory.


Michael Petch recapped MASM behaviour in a comment:

MASM treats strings (things between the quotes) in a special way when you use db. db is a single character (byte) so MASM will take each character and store it in a byte. This type of processing doesn't occur the same way with types larger than a byte ( dw and dd). In those situations MASM tries to stuff your string into into a single DWORD (32-bit value). Look what happens if you use dd and make your string <=4 characters in length. The error should disappear but the characters are placed in memory in reverse order.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • 1
    I think you mean `myString db "My string", 0` instead of `myString "My string", 0` – Michael Petch Aug 01 '19 at 17:30
  • Yes, thank you @MichaelPetch. It even took me some time to see the difference in the strings in your comment :D – Margaret Bloom Aug 01 '19 at 17:34
  • Are these statements correct? (1) Defining `dd` makes a 32-bit variable and sets unused bits to 0; (2) Defining a variable using `db` places each letter's value in 1 byte each. MASM creates one byte for every letter. – Joachim Rives Aug 02 '19 at 01:27
  • @MichaelPetch Does defining using `msg dd "tst" ` give msg 3 32-bit values i.e. double-words, with each double-word holding a character, zero-filled unused registers and 0 as a terminator? How much space can each variable hold? – Joachim Rives Aug 02 '19 at 01:35
  • @JoachimRives : No since you aren't using `db` MASM will attempt to store `tst` into the dword but it will store the characters little endian (backwards). If you define `msg` as `msg dd "abc"` it should emit the bytes in reverse order `bca` instead of `abc` . I recommend not emitting strings with anything but `db` as there are very few reasons to do so. – Michael Petch Aug 02 '19 at 01:43
  • @PeterCordes : Mixing in NASM in this answer only confuses things more IMHO. It is difficult enought to understand MASM let alone tossing other assemblers in here. The question was specifically targeting MASM and IMHO, it might be better to keep it that way. – Michael Petch Aug 02 '19 at 01:46
  • @MichaelPetch: Agreed. Deleted my comment. I'll just say that NASM is very different in how it treats strings in `dd`, and in byte-order. Future readers should look it up if they're looking at NASM instead of MASM code. – Peter Cordes Aug 02 '19 at 01:47
  • @MargaretBloom I saw this comment on my question: "MASM treats strings (things between the quotes) in a special way when you use db. db is a single character (byte) so MASM will take each character and store it in a byte. This type of processing doesn't occur the same way with types larger than a byte (dw and dd)." Is that correct? If so, could you add it to your answer? – Joachim Rives Aug 02 '19 at 06:38
  • @JoachimRives Yes, it's correct. No problem, I'm adding it (with references), in its full length :) – Margaret Bloom Aug 02 '19 at 08:37
  • "When a string like "ABCD" is used where a number is expected (like in a dd or as an immediate) MASM converts it to 0x44434241." I think MASM actually uses the first string character for the high-order byte, ie the result is equal to 41424344h. RBIL lists signatures that way. – ecm Aug 02 '19 at 16:21
  • @ecm Uhm, that would be a bit odd but possible. I don't have a MASM at hand right now, feel free to edit as this is a community wiki answer. :) As soon as I find a spare hour I'll check it. I though MASM would use the sensible choice with strings as immediates but I may have guessed wrong. – Margaret Bloom Aug 02 '19 at 17:37