Here is a simplified singly-linked list, where each node owns the next, along with a function for destroying the list:
struct Node {
Node* next = nullptr;
~Node() {
delete next;
}
};
void Destroy(Node* head) {
delete head;
}
Clang 15.0.0 with -O3
(Compiler Explorer) gives recursive code for this that uses call
instructions, i.e. no tail call recursion through ~Node
itself, only operator delete(void*)
:
Destroy(Node*): # @Destroy(Node*)
test rdi, rdi
je .LBB0_1
push rbx
mov rbx, rdi
call Node::~Node() [base object destructor]
mov rdi, rbx
pop rbx
jmp operator delete(void*)@PLT # TAILCALL
.LBB0_1:
ret
Node::~Node() [base object destructor]: # @Node::~Node() [base object destructor]
push rbx
mov rbx, qword ptr [rdi]
test rbx, rbx
je .LBB1_1
mov rdi, rbx
call Node::~Node() [base object destructor]
mov rdi, rbx
pop rbx
jmp operator delete(void*)@PLT # TAILCALL
.LBB1_1:
pop rbx
ret
Here is an open-coded version of something similar:
struct Node2 {
Node2* next = nullptr;
};
void Destroy2(Node2* head) {
auto* const next = head->next;
delete head;
if (next) {
Destroy2(next);
}
}
Even with -01
clang turns the tail call into an efficient loop with O(1) stack frames involved for an arbitrary number of list nodes:
Destroy2(Node2*): # @Destroy2(Node2*)
push rbx
.LBB2_1: # =>This Inner Loop Header: Depth=1
mov rbx, qword ptr [rdi]
call operator delete(void*)@PLT
mov rdi, rbx
test rbx, rbx
jne .LBB2_1
pop rbx
ret
I understand that compiler optimizations aren't guaranteed, but I'm surprised clang isn't able to do something more efficient with the basic Destroy
case. It leads me to think that the key difference is in the fact that Destroy2
is able to free the memory for head
before it deals with head->next
. But it seems to me that shouldn't matter unless operator delete
is allowed to have some visible side effect.
Is there an important semantic difference between these two from the point of view of the abstract machine, preventing clang from optimizing the first case? If so, is there a way to make ~Node
more friendly to the optimizer so that I don't need to open-code a destroy function?