I have following function which generates the hash given a string.
uint64_t
djb2(uint8_t *ptr, uint64_t len) {
uint64_t hash = 5381;
uint64_t step = 0;
do { // I can gurantee that [len > 0].
hash = ((hash << 5) + hash) + ptr[step];
++step;
} while (step < len);
return hash;
}
this generates following asm
:
djb2:
xor edx, edx
mov eax, 5381
.L2:
mov rcx, rax
sal rcx, 5
add rax, rcx
movzx ecx, BYTE PTR [rdi+rdx]
add rdx, 1 ; this is interesting.
add rax, rcx
cmp rdx, rsi
jb .L2
ret
As you can see GCC
with O2
prefers ADD 1
instead of INC
.
Question
- Is there a reason for that? If I'm not wrong there's no performance penalty but the code size of
INC
is smaller. - If this is really a missed thing by
GCC
(which I highly doubt, probably I'm wrong here), can I somehow fix it?
Note
- You can play with this example in Godbolt.
- I also thought that it picked
ADD
because it can execute multipleADD
s in parallel, but it is hard to say (I'm not an expert here).
EDIT
- In the comments there are extremely valuable links, however I find it hard to understand the reasoning of picking
ADD
overINC
in this example, based on the given answers (in those links).