When compiling a loop, turbofan seems to peel the first loop iteration most of the time. For example a loop like:
function fill256(int32Array) {
var i = 255;
do {
int32Array[i] = 0;
} while(--i >= 0);
}
gets optimized to machine code like this:
# rdx is int32Array
0x13f38bcef7a 5a 488b4a2f REX.W movq rcx,[rdx+0x2f]
0x13f38bcef7e 5e 488b7a3f REX.W movq rdi,[rdx+0x3f]
0x13f38bcef82 62 4c8b4237 REX.W movq r8,[rdx+0x37]
# peeled iteration
0x13f38bcef86 66 4881f9ff000000 REX.W cmpq rcx,0xff
0x13f38bcef8d 6d 0f8614010000 jna 0x13f38bcf0a7 <+0x187>
0x13f38bcef93 73 4e8d0c07 REX.W leaq r9,[rdi+r8*1]
0x13f38bcef97 77 41c781fc03000000000000 movl [r9+0x3fc],0x0 # dword store
0x13f38bcefa2 82 41b9fe000000 movl r9,0xfe
0x13f38bcefa8 88 e906000000 jmp 0x13f38bcefb3 <+0x93>
0x13f38bcefad 8d 0f1f00 nop
# loop proper
0x13f38bcefb0 90 4d8bcb REX.W movq r9,r11
# first iteration entry point:
0x13f38bcefb3 93 493b65e0 REX.W cmpq rsp,[r13-0x20] (external value (StackGuard::address_of_jslimit()))
0x13f38bcefb7 97 0f868b000000 jna 0x13f38bcf048 <+0x128>
0x13f38bcefbd 9d 458d59ff leal r11,[r9-0x1]
0x13f38bcefc1 a1 4d63e1 REX.W movsxlq r12,r9
0x13f38bcefc4 a4 4c3be1 REX.W cmpq r12,rcx
0x13f38bcefc7 a7 0f83e6000000 jnc 0x13f38bcf0b3 <+0x193>
0x13f38bcefcd ad 4e8d0c07 REX.W leaq r9,[rdi+r8*1]
0x13f38bcefd1 b1 43c704a100000000 movl [r9+r12*4],0x0 # dword store
0x13f38bcefd9 b9 4183fb00 cmpl r11,0x0
0x13f38bcefdd bd 7dd1 jge 0x13f38bcefb0 <+0x90>
This is not specific to the particular loop construct, but is seemingly done for all loops with small bodies. The V8 source code comments just say this is an optimization but what does it actually accomplish other than bloating the code size?
I know that peeling can be beneficial if it introduces new invariants.