9

Let's say I want to define a initialized variable string before running my assembly program (in section .data). The variable I chose to create is called Digits and it is a string that contains all the hexadecimal symbols.

Digits: db "0123456789ABCDEF"

I defined the variable with db, that means define byte. Does this mean that the Digits variable is of 8-bits long? This doesn't seem to have sense for me because:

Each character in the string is an ASCII character, therefore I will need 2 bytes for each character. In total, I would need 32 bytes for the whole string!

So what does it mean when I define the variable as byte? Word? Double word? I don't see the difference. Because of my misunderstanding, it seems to be redundant to tell the type of data you need for the string.

PD: This question didn't help me to understand.

Community
  • 1
  • 1
Pichi Wuana
  • 732
  • 2
  • 9
  • 35
  • @JoseManuelAbarcaRodríguez I know that *I define a byte variable*. What about if I *define a word variable*? Will it be 16 words long, e.g. 32 bytes long? This doesn't make sense for me... I'm missing something. Everything is built of bytes, thus shouldn't be everything a *byte variable*? – Pichi Wuana Aug 09 '16 at 20:54
  • Example : `my_array DW 1,2,3,4` , this variable contains 4 values, each value is 2 bytes long, so the variable is 8 bytes. – Jose Manuel Abarca Rodríguez Aug 09 '16 at 20:55
  • @JoseManuelAbarcaRodríguez But in the case of strings? It's there where it's my misunderstanding, when I define strings. – Pichi Wuana Aug 09 '16 at 20:56
  • With the types DB, DW or DD you are telling how much memory to reserve, but memory is always a bunch of bytes. – Jose Manuel Abarca Rodríguez Aug 09 '16 at 20:59
  • Pichi Wuana : what compiler are you using? In my compiler the data is arranged depending on the type : for type DB the bytes preserve the given order, for DW every two bytes change their order ('ab' stores as 'ba'), and for DD every 4 bytes change their order ('abcd' stores as 'dcba'). This order changes because the compiler stores the first byte in the lowest 8 bits, the second byte in the next 8 bits, and so on. Maybe this is what confuses you. – Jose Manuel Abarca Rodríguez Aug 09 '16 at 21:51
  • 2
    Don't think of it as defining variables; you are defining labels to memory locations/allocations. – David Hoelzer Aug 10 '16 at 01:00
  • 1
    ASCII is 8 bit encoding (actually classic ASCII being 7 bit only, codes above 0x80 are platform specific, ISO-Latin1 encoding often used nowadays). So '0123456789ABCDEF` is 16 bytes, not 32. The "Digits: db" is sort of equal to do `Digits: db '0'` and then on new line doing `db '1', '2', '3', ...'F'`. (so the "Digits" label has address of byte containing '0'). The `'string'` syntax is shortcut to define values of multiple bytes. – Ped7g Aug 10 '16 at 10:56

3 Answers3

12

NASM answer, MASM is totally different

One of the answers on the linked question has a quote from the NASM manual's examples which does answer your question. As requested, I'll expand on it for all three cases (and correct the lower-case vs. upper-case ASCII encoding error!):

db   'ABCDE'     ; 0x41 0x42 0x43 0x44 0x45                (5 bytes)
dw   'ABCDE'     ; 0x41 0x42 0x43 0x44 0x45 0x00           (6 bytes, 3 words)
dd   'ABCDE'     ; 0x41 0x42 0x43 0x44 0x45 0x00 0x00 0x00 (8 bytes, 2 doublewords)
dq   'ABCDE'     ; 0x41 0x42 0x43 0x44 0x45 0x00 0x00 0x00 (8 bytes, 1 quadword)

So the difference is that it pads out to a multiple of the element size with zeros when you use dd or dw instead of db.

According to @Jose's comment, some assemblers may use a different byte order for dd or dw string constants. In NASM syntax, the string is always stored in memory in the same order it appears in the quoted constant.

You can assemble this with NASM (e.g. into the default flat binary output) and use hexdump -C or something to confirm the byte ordering and amount of padding.


Note that this padding to the element size applies to each comma-separated element. So the seemingly-innocent dd '%lf', 10, 0 actually assembles like this:

;dd   '%lf',    10,        0
db    '%lf',0,  10,0,0,0,  0,0,0,0        ;; equivalent with db

Note the 0 before the newline; if you pass a pointer to this to printf, the C string is just "%lf", terminated by the first 0 byte.

(write system call or fwrite function with an explicit length would print the whole thing, including the 0 bytes, because those functions work on binary data, not C implicit-length strings.)


Also note that in NASM, you can do stuff like mov dword [rdi], "abc" to store "abc\0" to memory. i.e. multi-character literals work as numeric literals in any context in NASM.


MASM is very different

See When using the MOV mnemonic to load/copy a string to a memory register in MASM, are the characters stored in reverse order? for more. Even in a dd "abcd", MASM breaks your strings, reversing the byte order inside chunks compared to source order.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Can you show an example for each of the other two cases, as you showed in a comment how it is with `dw`? – Pichi Wuana Aug 10 '16 at 16:44
  • In conclusion, the mnemonic doesn't actually change something to the variable, only when it's tables? – Pichi Wuana Aug 10 '16 at 18:06
  • 1
    @PichiWuana: That's not how I'd describe it. `db` with a separate ALIGN directive would make more sense if you're going to access it as separate bytes. Also, `db` just assembles bytes into the output file. Variables are a higher level concept that doesn't have to map 1:1 with directives or labels. – Peter Cordes Aug 10 '16 at 18:22
  • Relating to assemblers which reverse the bytes in a string, compare [this NASM source](https://hg.ulukai.org/ecm/ldebug/file/69e00a339763/source/msg.asm#l1681) with [this A86 source](https://hg.ulukai.org/ecm/fddebug/file/b237abb3a356/DEBUG.A86#l6365): For A86, the two-byte `dw` string items are reversed by the assembler, so the source has to contain the "backwards" strings. – ecm Apr 15 '21 at 07:30
1

I want to clarify something:

example: db 'ABCDE';

This reserves 5 bytes in total, each containing a letter.

ex2: db 1 ;

reserves a byte that contains 1

ex3: db "cool;

reserves 4 bytes and each byte contains a letter

ex4: db "cool", 1, 3;

reserves 3 bytes?

answers: ex4 is 6 bytes

opobtdfs
  • 5
  • 2
  • 1
    Put this in a file, assemble it, and hexdump it to see what your assembler put there. Your last example is wrong, `"cool", 1, 3` is 6 bytes total. – Peter Cordes Jun 04 '20 at 02:58
  • Also, this isn't really an answer. It's kind of phrased as a question, although the first 3 examples are correct answers. – Peter Cordes Jun 04 '20 at 05:45
  • @PeterCordes Yeah i figured out the answer. Thanx – opobtdfs Jun 04 '20 at 12:29
  • Then please [edit] your answer to correct the info you're leaving for future readers. Or delete it, if you don't actually want to take out the questions and turn this into just a correct answer. – Peter Cordes Jun 04 '20 at 12:46
-2

For each character in the string "0123456789ABCDEF" you need just one byte. So, the string will occupy 16 bytes in the memory.

In case of this declaration:

vark db 1

you can make this:

mov [vark],128

and cannot:

mov [vark],1024

but in this case:

vark dw 1

you can.

skaa
  • 79
  • 1
  • 1
    The OP didn't specify MASM, where the directives after a label magically affect the operand-size of instructions referencing it. Also, this doesn't answer the question at all, since this question is specifically about strings, not integer constants like in [the Q the OP linked](http://stackoverflow.com/questions/10168743/x86-assembly-which-variable-size-to-use-db-dw-dd) and said it didn't answer his question. – Peter Cordes Aug 09 '16 at 22:07