The type of problem Chandler was talking about can be easily illustrated with a simplified strcpy
:
char *stpcpy (char * dest, const char * src);
When writing an implementation of this, you might assume that the memory pointed to by dest
is completely separate from the memory pointed to by src
. The compiler) might want to optimize it by reading a block of characters from the string pointed to by src
, and writing all of them at once into dest
. But if dest
pointed to one byte ahead of src
, the behaviour of this would differ from a simple character-by-character copy.
Here the aliasing problem is that src
can alias dest
, and the generated code must be made less efficient than it could be if src
wasn't allowed to alias dest
.
The real strcpy
uses an extra keyword, Restrict (which is technically only part of C, not C++, that tells the compiler to assume that src
and dest
do not overlap, and this allows the compiler to generate much more efficient code.
Here's an even simpler example where we can see a big difference in the assembly:
void my_function_1(int* a, int* b, int* c) {
if (*a) *b = *a;
if (*a) *c = *a;
}
void my_function_2(int* __restrict a, int* __restrict b, int* __restrict c) {
if (*a) *b = *a;
if (*a) *c = *a;
}
Assume that this is a simplification of a function where it actually made sense to use two if-statements rather than just if (*a) { *b=*a; *c=*a; }
, but the intent is the same.
We may assume when writing this that a != b
because there's some reason why it would make no sense for my_function
be used like that. But the compiler can't assume that, and does a store of b
and a re-load of a
from memory before executing the second line, to cover the case where b == a
:
0000000000400550 <my_function_1>:
400550: 8b 07 mov (%rdi),%eax
400552: 85 c0 test %eax,%eax <= if (*a)
400554: 74 0a je 400560 <my_function_1+0x10>
400556: 89 06 mov %eax,(%rsi)
400558: 8b 07 mov (%rdi),%eax
40055a: 85 c0 test %eax,%eax <= if (*a)
40055c: 74 02 je 400560 <my_function_1+0x10>
40055e: 89 02 mov %eax,(%rdx)
400560: f3 c3 repz retq
If we remove potential for aliasing by adding __restrict
, the compiler generates shorter and faster code:
0000000000400570 <my_function_2>:
400570: 8b 07 mov (%rdi),%eax
400572: 85 c0 test %eax,%eax
400574: 74 04 je 40057a <_Z9my_function_2PiS_S_+0xa>
400576: 89 06 mov %eax,(%rsi)
400578: 89 02 mov %eax,(%rdx)
40057a: f3 c3 repz retq