I just tested a small example to check whether __restrict__
works in C++ on the latest compilers:
void foo(int x,int* __restrict__ ptr1, int& v2) {
for(int i=0;i<x;i++) {
if(*ptr1==v2) {
++ptr1;
} else {
*ptr1=*ptr1+1;
}
}
}
When trying it on godbolt.org with the latest gcc (gcc8.1 -O3 -std=c++14), the __restrict__
works as expected: v2
is loaded only once, since it cannot alias with ptr1
.
Here are the relevant assembly parts:
.L5:
mov eax, DWORD PTR [rsi]
cmp eax, ecx # <-- ecx contains v2, no load from memory
jne .L3
add edx, 1
add rsi, 4
cmp edi, edx
jne .L5
Now the same with the latest clang (clang 6.0.0 -O3 -std=c++14). It unrolls the loop once, so the generated code is much bigger, but here is the gist:
.LBB0_3: # =>This Inner Loop Header: Depth=1
mov edi, dword ptr [rsi]
cmp edi, dword ptr [rdx] # <-- restrict didn't work, v2 loaded from memory in hot loop
jne .LBB0_9
add rsi, 4
mov edi, dword ptr [rsi]
cmp edi, dword ptr [rdx] # <-- restrict didn't work, v2 loaded from memory in hot loop
je .LBB0_12
Why is this the case? I know that __restrict__
is non-standard and the compiler is free to ignore it, but it seems to be a very fundamental technique for getting the last bit of performance out of ones code, so I doubt that clang simply does not support it while supporting and ignoring the keyword itself. So, what is the issue here? Am I doing anything wrong?