0

I am trying to write a program in Assembly x86-64 for an intel 64-bit processor. The program should be compiled with gas (GNU assembler) and run on Linux. The problem is to write a program named lowercase that takes an input string and prints the lowercase of that string. It should be compiled like this:

$> echo "STRING" | ./lowercase
   string
$>

I wrote the program but the problem is that it prints spaces infinitely. Who can help me understand why the following code behaves like that?

.section .bss
.comm buf, 1

.section .text
.globl _start

_start:
        mov $65,        %bh
        mov $97,        %ch
        mov $0,         %dh

Loop:
        mov $0,         %rax                    # syscall number for read
        mov $0,         %rdi                    # where to read from: stdin
        mov $buf,       %rsi                    # buffer adr
        mov $1,         %rdx                    # length of the buffer in bytes
        syscall

        cmpb %dh,       buf                     # if read returns 0 (EOF) or less then 0 exit
        jle Exit
        cmpb %bh,       buf                     # if the character is less than 65 (Char A) print it
        jl Write
        cmpb %ch,       buf                     # if the charcter is less than 97 make it lowercase
        jl ToLowercase


Write:
        mov $1,         %rax                    # system call for write
        mov $1,         %rdi                    # file handle for stdout
        mov $buf,       %rsi                    # address of string to output
        mov $1,         %rdx                    # number of bytes
        syscall
        jmp Loop

ToLowercase:
        addb $32,       buf                     # Make the character lowercase
        jmp Write                               # And go back to output it

Exit:
        mov   $60,      %rax                    # system call for exit
        movb  $0,       %dil                    # return code
        syscall

  • Have you tried debugging it using single step and checking each instruction is doing what you expect? If not, that's the way to go here. If you have, then include that information here about which instruction fails your expectations. Debugging is an essential skill for programmers, especially for assembly. – Erik Eidt Nov 21 '21 at 14:59

1 Answers1

2

This line:

cmpb %dh,       buf                     # if read returns 0 (EOF) or less then 0 exit

You modified %rdx (thus %dh) with the count argument to syscall. Also, syscall has no contract to preserve %rdx, so this check is invalid.

Also, the return value from syscall (linux, others) is in %rax, so you are checking an undefined value (%dh) with buf? Something more like

cmp $1, %rax
jlt Exit

would test the return from read. Then you need to see if you are in'A'..'Z':

...
mov buf, %dl
cmp $'A', %dl
jl  write
cmp $'Z', %dl
jle ToLowerCase
...
mevets
  • 10,070
  • 1
  • 21
  • 33
  • 1
    `syscall` on Linux will preserve `rdx` and most other registers, so `rdx` will still contain 1 after the syscall (but of course that is not the `dh` value that OP wanted, which they already overwrite). It will however overwrite `rcx` and `r11`, and OP does have important values in `rcx` (specifically `ch`). – Nate Eldredge Nov 21 '21 at 15:07
  • 1
    From my school of thought, you don't bind something to a particular implementation when you don't have to. Rdx is a volatile, so shouldn't be counted on; besides the fact that it was the wrong register to look at; and similarly %rcx is definitely destroyed... – mevets Nov 21 '21 at 17:04
  • @mevets Thank you very much for a clear explanation. It fixed the problem! – Sargis Hovsepyan Nov 21 '21 at 17:45
  • RDX is volatile in the *function*-calling convention on Linux. It's call-preserved in the `syscall` calling convention on Linux, as documented in the appendix of the x86-64 SysV ABI (bullet-point 2 from it is quoted in earlier revisions of [What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64](https://stackoverflow.com/q/2535989) - the current version has my edit which phrases it more clearly.) Also as documented inside the kernel in its nolibc example, see https://lore.kernel.org/lkml/alpine.LSU.2.20.2110131601000.26294@wotan.suse.de/ – Peter Cordes Nov 22 '21 at 00:48
  • Don't confuse function-calling with system-calling. They're similar for convenience and efficiency, but RCX and R11 are destroyed *by the `syscall` instruction itself* before the kernel gets control, not arbitrarily by the kernel because it wants to. In fact that's *why* the arg-passing regs had to differ: RCX couldn't still be the 4th arg-passing reg so they subbed in R10. – Peter Cordes Nov 22 '21 at 00:50