0

There's this unanswered question in the Igor Zhirkov's book Low-Level Programming :

"Try to rewrite print_newline without calling print_char or copying its code. Hint: read about tail call optimization.".

I've spent some times reading about tail call optimization, but I can't figure out what would be the proper way of doing it.

Original code (without tail-call): print_newline => print_char => print_string=>string_length=>print_string

string_length:
    xor rax, rax
.loop:
    cmp byte [rdi+rax], 0   ; if rdi =  0 (not an address), then seg fault
    je .end 
    inc rax
    jmp .loop 
.end:
    ret
print_string:
    push rdi
    call string_length
    pop rsi
    mov rdx, rax
    mov rax, 1
    mov rdi, 1
    syscall
    ret
print_char:
    push rdi
    mov rdi, rsp 
    call print_string 
    pop rdi
    ret
print_newline:
    mov rdi, 10
    jmp print_char

What could be a tail-call optimization of that ?

trogne
  • 3,402
  • 3
  • 33
  • 50
  • 2
    `print_newline` has tail call optimization applied to it. Note how it is using `jmp` instead of `ret`. So, the flow of control will be `print_newline`, `print_char`, `print_string`, back to `print_char`, back to `print_newline`'s caller (not back to `print_newline`). Without tail call optimization, `print_newline` would read `mov rdi, 10; call print_char; ret`. – Erik Eidt Apr 29 '21 at 20:55
  • Thank Erik! So there's nothing to "rewrite", and the question makes no sense ? – trogne Apr 29 '21 at 20:57
  • (Note how it is using `jmp` instead of `call` + `ret`.) – Erik Eidt Apr 29 '21 at 21:00
  • 2
    Yes, TCO has already been applied to `print_newline` in the code you posted. – Erik Eidt Apr 29 '21 at 21:01
  • Great ! Very clear explanation. Thank you. – trogne Apr 29 '21 at 21:02
  • 2
    Not really an example of TCO, but you could also just put `print_newline: mov rdi, 10` on the line immediately before `print_char:`, so that `print_newline` will fall through into `print_char` without needing a jump at all. Of course this trick only works once, e.g. you couldn't also do `print_space` this way. – Nate Eldredge Apr 29 '21 at 22:02
  • @NateEldredge , but in this case, the functions are in a library. So I can't add an orphan instruction before `print_char`. Or maybe I don't understand. What do you mean by "fall through into `print_char`" ? – trogne Apr 30 '21 at 00:31
  • If you're looking to optimize *speed* (not code-size) of `print_newline`, don't let it waste time calling string_length. Make another wrapper (or inline) `write` with a length so `print_newline`, `print_char`, and `print_uint` can use it. – Peter Cordes Apr 30 '21 at 01:02

1 Answers1

1

For the sake of reference I will provide the solution in this thread too.

  1. Without optimizations
print_newline:
    mov rdi, 10
    call print_char
    ret
print_char:
    push rdi
    mov rdi, rsp 
    call print_string 
    pop rdi
    ret
  1. A tail call optimization. A sequence of instructions call X and ret can always be substituted by jmp X.
print_newline:
    mov rdi, 10
    jmp print_char
print_char:
    push rdi
    mov rdi, rsp 
    call print_string 
    pop rdi
    ret
  1. Just falling into print_char; basically making a subroutine with two entry points.
print_newline:
    mov rdi, 10
print_char:
    push rdi
    mov rdi, rsp 
    call print_string 
    pop rdi
    ret

Here is a more in-depth explanation in my blog

Igor Zhirkov
  • 303
  • 2
  • 8
  • "A sequence of instructions `call X` and `ret` can always be substituted by `jmp X`." Not strictly true, if the X function uses eg `retn 2` to pop something off the stack. – ecm May 05 '21 at 17:11
  • You are right, when `ret` has an argument this is not applicable. – Igor Zhirkov May 05 '21 at 17:27
  • *For the sake of reference I will provide the solution in this thread too.* - if this is a duplicate of another Q&A, it's usually better to close one as a duplicate of the other. Or if the questions are different enough to justify similar but different answers, IMO it's good to link the other question (or your answer on it) which has something different to say about the same subject. I guess these examples don't really belong in [printing a signed integer optimization](https://stackoverflow.com/a/67393573), but your answer there could link here as well as your blog. – Peter Cordes May 05 '21 at 17:41