I am trying to understand the effects of memory aliasing and how to improve my code to avoid it. I am re-writing my cache coherent Entity Component System and I want to take memory aliasing into account.
The main source I have is Christer Ericson's talk from GDC 2003, thus why I would like to know if the problems he describes have somehow been mitigated by modern C++ compilers.
In specific, do modern C++ compilers suffer from memory aliasing as much as Christer says, especially for member variable access (due to the implicit 'this' ptr)?
#include <stdlib.h>
class TestList
{
public:
TestList()
{
// atoi here to avoid compiler optimization around hardcoded 20
count = atoi("20");
data = new int64_t[count];
}
int64_t count;
int64_t* data;
void ClearOptimized();
void ClearNonOptimized();
};
// Not inlined on purpose
void TestList::ClearOptimized()
{
// According to Christer, this avoids aliasing even for
// simple compilers, because we are aiding the compiler
// to identify that there is no aliasing in the iteration.
for (int64_t i = 0, size = count; i < size; ++i)
{
data[i] = 0;
}
}
void TestList::ClearNonOptimized()
{
// According to Christer the compiler doesn't know
// if 'count' is aliasing with data... A smart compiler
// should be able to identify it might be aliased for
// the first element only, but all other iterations can't
// so it will unroll the first element of the loop into a
// separated range check.
for (int64_t i = 0; i < count; ++i)
{
data[i] = 0;
}
}
int main()
{
TestList listA;
listA.ClearNonOptimized();
TestList listB;
listB.ClearOptimized();
return listA.data[listA.count-1] + listB.data[listB.count-1];
}
I ran through a few websites that indicate that modern compilers still present most of those problems, although nowadays we seem to have better tools to avoid aliasing (such as type-punning).
I tried verifying this by looking at Compiler Explorer with the above code. But I find it hard to reason the assembly code... Both GCC and Clang with the highest optimization flag seem to be doing an extra ptr access every time.
Clang:
TestList::ClearOptimized(): # @TestList::ClearOptimized()
push rax
mov rdx, qword ptr [rdi]
test rdx, rdx
jle .LBB0_2
mov rdi, qword ptr [rdi + 8]
shl rdx, 3
xor esi, esi
call memset
.LBB0_2:
pop rax
ret
TestList::ClearNonOptimized(): # @TestList::ClearNonOptimized()
cmp qword ptr [rdi], 0
jle .LBB1_3
mov rax, qword ptr [rdi + 8]
xor ecx, ecx
.LBB1_2: # =>This Inner Loop Header: Depth=1
mov qword ptr [rax + 8*rcx], 0
add rcx, 1
cmp rcx, qword ptr [rdi] <=== Is this due to memory aliasing?
jl .LBB1_2
.LBB1_3:
ret
GCC:
TestList::ClearOptimized():
mov rdx, QWORD PTR [rdi]
test rdx, rdx
jle .L6
mov rdi, QWORD PTR [rdi+8]
sal rdx, 3
xor esi, esi
jmp memset
.L6:
ret
TestList::ClearNonOptimized():
cmp QWORD PTR [rdi], 0
jle .L8
mov rdx, QWORD PTR [rdi+8]
xor eax, eax
.L10:
mov QWORD PTR [rdx+rax*8], 0
add rax, 1
cmp QWORD PTR [rdi], rax <=== Is this due to memory aliasing?
jg .L10
.L8:
ret
Am I reading this right? Does that mean it will fetch the information in the cache instead of using it from a register?