I have this C:
#include <stddef.h>
size_t findChar(unsigned int length, char* __attribute__((aligned(16))) restrict string) {
for (size_t i = 0; i < length; i += 2) {
if (string[i] == '[' || string[i] == ' ') {
return i;
}
}
return -1;
}
It checks every other character of a string and returns the first index of the string that is [
or
. With x86-64 GCC 10.2 -O3 -march=skylake -mtune=skylake
, this is the assembly output:
findChar:
mov edi, edi
test rdi, rdi
je .L4
xor eax, eax
.L3:
movzx edx, BYTE PTR [rsi+rax]
cmp dl, 91
je .L1
cmp dl, 32
je .L1
add rax, 2
cmp rax, rdi
jb .L3
.L4:
mov rax, -1
.L1:
ret
It seems like it could be optimized significantly, because I see multiple branches. How can I write my C so that the compiler optimizes it with SIMD, string instructions, and/or vectorization?
How do I write my code to signal to the compiler that this code can be optimized?
Interactive assembly output on Godbolt: https://godbolt.org/z/W19Gz8x73
Changing it to a VLA with an explicitly declared length doesn't help much: https://godbolt.org/z/bb5fzbdM1
This is the version of the code modified so that the function would only return every 100 characters: https://godbolt.org/z/h8MjbP1cf