0

I am studying Assembly language using this nasm tutorial. Here is the code that prints a string:

SECTION .data
msg     db      'Hello!', 0Ah

SECTION .text
global  _start

_start:

    mov     ebx, msg
    mov     eax, ebx

    ; calculate number of bytes in string
nextchar:
    cmp     byte [eax], 0
    jz      finished
    inc     eax
    jmp     nextchar

finished:
    sub     eax, ebx    ; number of bytes in eax now 

    mov     edx, eax    ; number of bytes to write - one for each letter plus 0Ah (line feed character)
    mov     ecx, ebx    ; move the memory address of our message string into ecx
    mov     ebx, 1      ; write to the STDOUT file
    mov     eax, 4      ; invoke sys_write (kernel opcode 4)
    int     80h

    mov     ebx, 0      ; no errors
    mov     eax, 1      ; invoke sys_exit (kernel opcode 1)
    int     80h

It works and successfully prints "Hello!\n" to STDOUT. One thing I don't understand: it searches for \0 byte in msg, but we didn't define it. Ideally, the correct message definition should be

msg     db      'Hello!', 0Ah, 0h

How does it successfully get the zero byte at the end of the string?

The similar case is in exercise 7:

; String printing with line feed function
sprintLF:
    call    sprint
 
    push    eax         ; push eax onto the stack to preserve it while we use the eax register in this function
    mov     eax, 0Ah    ; move 0Ah into eax - 0Ah is the ascii character for a linefeed
    push    eax         ; push the linefeed onto the stack so we can get the address
    mov     eax, esp    ; move the address of the current stack pointer into eax for sprint
    call    sprint      ; call our sprint function
    pop     eax         ; remove our linefeed character from the stack
    pop     eax         ; restore the original value of eax before our function was called
    ret                 ; return to our program

It puts just 1 byte: 0Ah into eax without terminating 0h, but the string length is calculated correctly inside sprint. What is the cause?

user4035
  • 22,508
  • 11
  • 59
  • 94
  • 4
    Try adding another string after `msg` and compare the difference. You'll find that `nasm` just padds the `.data` section with NUL bytes to 4 kB. – fuz Oct 29 '21 at 12:18
  • Yup, that first case is a bug in the tutorial, relying on a coincidence of assembler + linker behaviour and the fact that you're not linking any other files with a `.data` section. (There are some other tutorial links in https://stackoverflow.com/tags/x86/info, but I forget if any of the NASM ones listed there are any good.) – Peter Cordes Oct 29 '21 at 12:30
  • 2
    In the 2nd case, `push 0Ah` is a 4-byte dword push whose high 3 bytes are zeros. That *is* correct. (But an inefficient way to implement `push 0Ah`.) – Peter Cordes Oct 29 '21 at 12:34
  • @PeterCordes What is an efficient way to implement it? – user4035 Oct 29 '21 at 12:37
  • 2
    With `push 0Ah`, instead of `mov eax, 0Ah` / `push eax`. :P And since you asked about efficiency, nextchar at least uses a pointer increment instead of having two `inc` instructions in the loop, but can only run at 1 iter/clock on Haswell and later because it has two branches in the loop. See [Why are loops always compiled into "do...while" style (tail jump)?](https://stackoverflow.com/q/47783926) – Peter Cordes Oct 29 '21 at 12:56
  • 1
    And of course searching 1 byte at a time is garbage on modern x86, where we can easily check 16 bytes at a time with SSE2, if we have alignment to avoid crossing into an unmapped page after the end of the string. [Is it safe to read past the end of a buffer within the same page on x86 and x64?](https://stackoverflow.com/q/37800739) / [Why is this code using strlen heavily 6.5x slower with GCC optimizations enabled?](https://stackoverflow.com/q/55563598) – Peter Cordes Oct 29 '21 at 12:59

0 Answers0