2

There's this unanswered question in the Igor Zhirkov's book Low-Level Programming :

"Try to rewrite print_int without calling print_uint, copying its code, or using jmp. You will only need one instruction and a careful code placement.

Read about co-routines.".

The supplied code for "print_int" and "print_uint":

print_uint:
    mov rax, rdi
    mov rdi, rsp
    push 0
    sub rsp, 16
    
    dec rdi
    mov r8, 10

.loop:
    xor rdx, rdx
    div r8
    or  dl, 0x30
    dec rdi       
    mov [rdi], dl
    test rax, rax
    jnz .loop 
   
    call print_string
    
    add rsp, 24
    ret

print_int:
    test rdi, rdi
    jns print_uint
    push rdi
    mov rdi, '-'
    call print_char
    pop rdi
    neg rdi
    jmp print_uint

print_char:
    push rdi
    mov rdi, rsp
    call print_string 
    pop rdi
    ret
print_string:
    push rdi
    call string_length
    pop rsi
    mov rdx, rax
    mov rax, 1
    mov rdi, 1
    syscall
    ret

What could be that special single instruction he's talking about ?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
trogne
  • 3,402
  • 3
  • 33
  • 50
  • 2
    I see the way to get rid of `jmp print_uint`, but that "one instruction" is still a mystery – harold Apr 29 '21 at 21:35
  • 3
    Are you supposed to move `print_int` to before `print_uint` so it can *fall* into it (tailcall for free) instead of using `jmp`? Perhaps he means "*change* one instruction", as in remove the `jmp print_uint`? – Peter Cordes Apr 30 '21 at 00:54
  • 2
    BTW, `print_uint` is nicely written, pretty much the same as what I did in [How do I print an integer in Assembly Level Programming without printf from the c library?](https://stackoverflow.com/a/46301894). If this isn't trying to be portable to non-Linux, though, you could take advantage of the red-zone below RSP and leave out the `sub rsp, 16` if you inline the syscall and calculate the length, instead of calling `print_string` and making it search for a terminating `0` when you already know where it is because you pushed it. – Peter Cordes Apr 30 '21 at 00:57
  • 1
    @PeterCordes thank you. This code however has never been tested to be the fastest version possible, not was it written with this intent. The portability is also out of question. Because this assignment is fairly early in the book the notion of Red zone does not yet exist for the reader, so I did not use it. – Igor Zhirkov May 04 '21 at 22:59
  • 1
    @IgorZhirkov: Yup, same reason I left `sub rsp` in my version (and for the benefit of people porting it to 32-bit code). If you were actually optimizing for speed you'd use a multiplicative inverse (like [this code-review Q&A](https://codereview.stackexchange.com/questions/142842/integer-to-ascii-algorithm-x86-assembly), and that would save more cycles than avoid sub/add. :P See also other links at the bottom of my linked answer for some blogs where people have experimented with `x /= 100` and splitting that up to get some ILP, and other things like that. – Peter Cordes May 05 '21 at 04:03

1 Answers1

3

I am so sorry, but it is unfortunately an error.

There is a pair of functions print_newline and print_char where print_newline can be expressed as one instruction if the control falls to print_char afterwards. I wrote a blog post about it. The basic idea is that printing a specific character i.e. the newline feed is like starting the "print any character" subroutine when its argument is assigned the code of the said character.

print_newline:
   mov rdi, '\n'  ; first integer argument is in rdi
print_char:
   ...

As to print_int I am sure that on AMD64 you can not express it through one instruction and fall to print_uint.

Igor Zhirkov
  • 303
  • 2
  • 8