2

This is the normal way to declare a variable of type byte in assembly:

msg0 BYTE "string_1 in upper case: ",0

What's the need to manually specify ,0? It probably marks the end of the string.

But isn't the end of the string obvious once we close the double quotes?

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
rohit sharma
  • 171
  • 1
  • 9
  • 5
    The quotes are not part of the string, they won't be in memory. The zero byte will be. Note that not all strings need to be zero terminated which is one of the reasons why the assembler doesn't add the zero automatically. Some assemblers have special directives for zero terminated strings, e.g. gnu assembler has `.string` or `.asciz` which do append the zero for you without having to type it out. – Jester Apr 26 '21 at 13:53
  • 3
    The double quotes don't exist in memory. The memory layout of this string `"FOO", 0` will be `F, O, O, \0`. The `0` isn't for you, it's for functions that use the string to know when the string ends. – mediocrevegetable1 Apr 26 '21 at 13:54
  • Okay Thanks. So in what cases would we use a character apart from 0 ? – rohit sharma Apr 26 '21 at 14:02
  • 1
    @rohitsharma well, maybe there could be some functions that expect another byte to signify the end of a string, but I've never seen it be like that. `0` is pretty much always used. Note that (I think) in DOS strings end with `$` instead, but that's the only exception I know. – mediocrevegetable1 Apr 26 '21 at 14:35
  • 2
    @mediocrevegetable1: DOS interrupt 21h service 09h uses dollar-sign-terminated strings. It is popular for Hello World type programs because it is the most simple to use string output function provided by DOS. (Alternative is to use service 40h, `bx` = 1 (stdout), `cx` = length, `ds:dx` -> string.) It is a holdover from CP/M I believe. – ecm Apr 26 '21 at 18:03
  • Can we say that there is no such thing as a string in assembly. It's simply a byte array ? And that quotes is a way of specifying the elements of the array ? Also instead of providing that 0 at the end is it possible to include it within the quotes itself ? – rohit sharma Apr 26 '21 at 18:07
  • 1
    @ecm thanks for the clarification, I've personally never written ASM for DOS but I occasionally see questions about it here so I based my assumption on that. – mediocrevegetable1 Apr 26 '21 at 18:11
  • 2
    @rohitsharma you would be right, I believe. The quotes are just a short form for specifying each specific character. To include the 0 within the quotes, your assembler would probably have to support escape characters (NASM for example allows this with backtick strings literals. You seem to be using MASM/TASM from the `BYTE` though so I don't know much about that). Or as Jester said, some assemblers like GAS will have specific directives for 0-terminated strings. – mediocrevegetable1 Apr 26 '21 at 18:15
  • 1
    There's no implicit terminating zero appended by close-quotes because you don't always want that. e.g. for passing to a `write` system call that takes a length, you just want the ASCII bytes and a length (explicit-length string), not an implicit-length 0-terminated C string. – Peter Cordes Apr 27 '21 at 03:02

1 Answers1

2

There's no implicit terminating zero appended by close-quotes because you don't always want that. e.g. for passing to a write system call that takes a length, you just want the ASCII bytes and a length (explicit-length string), not an implicit-length 0-terminated C string.

e.g.

msg  db "hello"
msglen = $ - msg

Or as part of a struct or something, effectively defining a fixed-width char buf[4] or something where all the uses take all 4 bytes, not searching for a terminating 0.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • This is little bit different question from what I posted. But I am looking for a different IDE than Microsoft visual studio. Any recomendations ? How is vscode ? – rohit sharma Apr 27 '21 at 06:47
  • @rohitsharma: I use emacs (or sometimes vim for quick edits) + a command line, on GNU/Linux. Works fine for me (especially running `perf stat` on small test programs / microbenchmarks), never bothered with any bloated IDEs, especially years ago when RAM was tighter than on my current computer. – Peter Cordes Apr 27 '21 at 07:05
  • Is there any online reference for x86 instruction set ? Like if I want to see the exact format for `mov` instruction which is a reliable and official documentation – rohit sharma Apr 27 '21 at 09:32
  • 1
    Yeah, Intel and AMD both have PDF manuals. There are lots of HTML scrapes of Intel, like https://www.felixcloutier.com/x86/. On [the entry for `mov`](https://www.felixcloutier.com/x86/mov), you can see there's are forms with `mov reg, r/m`, `mov r/m, reg`, and `mov r/m, immediate`. It doesn't have examples, or any assembler-specific details about syntax for addressing modes, of course, because that's the ISA documentation (for the machine code / opcodes), and syntax details depend on the tool you use to create the machine code you want (the assembler). – Peter Cordes Apr 27 '21 at 09:38
  • I checked `https://www.felixcloutier.com/x86/mov` Interestingly it doesn't mention the instruction `mov byte ptr [edx], 0` – rohit sharma Apr 27 '21 at 10:15
  • 1
    It's the `MOV r/m8, imm8` form of MOV (with an m8 memory operand in this case). I told you it doesn't go into details about assembler-specific syntax for addressing modes and so on, and of course it's not going to enumerate every possible option for the operands for every instruction, that would take forever. It tells you the destination can be a r/m ModRM, so that tells you it can be anything that allows, and ib can be any 8-bit constant. Did you also want something like [Referencing the contents of a memory location. (x86 addressing modes)](https://stackoverflow.com/q/34058101)? – Peter Cordes Apr 27 '21 at 10:18
  • 1
    Of course Intel's PDF manual describes all those components, but separately from each instruction. See the intro chapters and appendices in https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html. – Peter Cordes Apr 27 '21 at 10:21