16

I am learning assembly I see two examples of defining a string:

msg db 'Hello, world!',0xa

  • what does the 0xa mean here?

message DB 'I am loving it!', 0

  • why we have a 0 here?
  • is it a trailing null character?
  • why we have 0xa the above example but 0 here? (doesn't seem they are relating to string length)

If the above examples are two ways of defining an assembly string, how could the program differentiate them?

Thanks ahead for any help :)

DuoD
  • 163
  • 1
  • 1
  • 7
  • 1
    It depends on what assembler you're using. They all have different syntax requirements. – Marc B Oct 22 '13 at 19:34
  • For NASM, [Using db to declare a string in assembly NASM](https://stackoverflow.com/q/41647537) has a good answer with links to docs – Peter Cordes Dec 17 '21 at 02:41

3 Answers3

16

The different assemblers have different syntax, but in the case of db directive they are pretty consistent.

db is an assembly directive, that defines bytes with the given value in the place where the directive is located in the source. Optionally, some label can be assigned to the directive.

The common syntax is:

[label]  db  n1, n2, n3, ..., nk

where n1..nk are some byte sized numbers (from 0..0xff) or some string constant.

As long as the ASCII string consists of bytes, the directive simply places these bytes in the memory, exactly as the other numbers in the directive.

Example:

db 1, 2, 3, 4

will allocate 4 bytes and will fill them with the numbers 1, 2, 3 and 4

string  db 'Assembly', 0, 1, 2, 3

will be compiled to:

string:  41h, 73h, 73h, 65h, 6Dh, 62h, 6Ch, 79h, 00h, 01h, 02h, 03h

The character with ASCII code 0Ah (0xa) is the character LF (line feed) that is used in Linux as a new line command for the console.

The character with ASCII code 00h (0) is the NULL character that is used as a end-of-string mark in the C-like languages. (and probably in the OS API calls, because most OSes are written in C)

Appendix 1: There are several other assembly directives similar to DB in that they define some data in the memory, but with other size. Most common are DW (define word), DD (define double word) and DQ (define quadruple word) for 16, 32 and 64 bit data. However, their syntax accepts only numbers, not strings.

johnfound
  • 6,857
  • 4
  • 31
  • 60
  • can you explain the code to which the mentioned "string" data item will be compiled to if DD is used instead of DB. This is the modified code "string dd 'Assembly', 0, 1, 2, 3" – Kaustav Apr 25 '16 at 17:55
  • @Kaustav Most assemblers will give you an error with "DD" instead of "DB" directive. – johnfound Apr 26 '16 at 05:18
  • I am new to asm, but I think if I define DD then assembler will allocate 16bits for easch item. Also how can you store value more than 255 in a 8 bit storage by using DB? – Kaustav Apr 26 '16 at 16:21
  • @Kaustav DD allocates 32bit (4 bytes) per item. Storing longer than 1 byte values in byte array are stored byte-after-byte (of course). – johnfound Apr 27 '16 at 06:53
  • It's good style for labels to include a `:`, like `string: db 'Assembly', 0, 1, 2, 3`. Prevents ambiguity if a label name happens to be an instruction mnemonic or directive. (Works in NASM; MASM may only allow "variable" declarations in data sections, not plain labels.) – Peter Cordes Dec 17 '21 at 02:45
  • @PeterCordes, not always. It depends on assembler syntax. For example, in FASM the labels with : are "untyped" labels for use with the code, while the data definition labels (without colon and followed by data directives are typed and the compiler will raise errors when used with wrong type. – johnfound Dec 19 '21 at 11:37
1

0 is a trailing null, yes. 0xa is a newline. They don’t define the same string, so that’s how you would differentiate them.

Ry-
  • 218,210
  • 55
  • 464
  • 476
1

0xa stands for the hexadecimal value "A" which is 10 in decimal. The Linefeed control character has ASCII code 10 (Return has D hexadecimal or 13 decimal).

Strings are commonly terminated by a nul character to indicate their end.

Axel Kemper
  • 10,544
  • 2
  • 31
  • 54