I've made a function to calculate the length of a C string (I'm trying to beat clang's optimizer using -O3
). I'm running macOS.
_string_length1:
push rbp
mov rbp, rsp
xor rax, rax
.body:
cmp byte [rdi], 0
je .exit
inc rdi
inc rax
jmp .body
.exit:
pop rbp
ret
This is the C function I'm trying to beat:
size_t string_length2(const char *str) {
size_t ret = 0;
while (str[ret]) {
ret++;
}
return ret;
}
And it disassembles to this:
string_length2:
push rbp
mov rbp, rsp
mov rax, -1
LBB0_1:
cmp byte ptr [rdi + rax + 1], 0
lea rax, [rax + 1]
jne LBB0_1
pop rbp
ret
Every C function sets up a stack frame using push rbp
and mov rbp, rsp
, and breaks it using pop rbp
. But I'm not using the stack in any way here, I'm only using processor registers. It worked without using a stack frame (when I tested on x86-64), but is it necessary?