0

I am trying to understand the effects of memory aliasing and how to improve my code to avoid it. I am re-writing my cache coherent Entity Component System and I want to take memory aliasing into account.

The main source I have is Christer Ericson's talk from GDC 2003, thus why I would like to know if the problems he describes have somehow been mitigated by modern C++ compilers.

In specific, do modern C++ compilers suffer from memory aliasing as much as Christer says, especially for member variable access (due to the implicit 'this' ptr)?

#include <stdlib.h>

class TestList
{
  public:
    TestList()
    {
       // atoi here to avoid compiler optimization around hardcoded 20
       count = atoi("20");
       data = new int64_t[count];
    }

    int64_t count;
    int64_t* data;

    void ClearOptimized();
    void ClearNonOptimized();
};

// Not inlined on purpose
void TestList::ClearOptimized()
{
    // According to Christer, this avoids aliasing even for
    // simple compilers, because we are aiding the compiler
    // to identify that there is no aliasing in the iteration.
    for (int64_t i = 0, size = count; i < size; ++i)
    {
        data[i] = 0;
    }
}
void TestList::ClearNonOptimized()
{
    // According to Christer the compiler doesn't know
    // if 'count' is aliasing with data... A smart compiler
    // should be able to identify it might be aliased for
    // the first element only, but all other iterations can't
    // so it will unroll the first element of the loop into a
    // separated range check.
    for (int64_t i = 0; i < count; ++i)
    {
        data[i] = 0;
    }
}

int main()
{
    TestList listA;
    listA.ClearNonOptimized();
    TestList listB;
    listB.ClearOptimized();
    return listA.data[listA.count-1] + listB.data[listB.count-1];
}

I ran through a few websites that indicate that modern compilers still present most of those problems, although nowadays we seem to have better tools to avoid aliasing (such as type-punning).

I tried verifying this by looking at Compiler Explorer with the above code. But I find it hard to reason the assembly code... Both GCC and Clang with the highest optimization flag seem to be doing an extra ptr access every time.

Clang:

TestList::ClearOptimized():         # @TestList::ClearOptimized()
        push    rax
        mov     rdx, qword ptr [rdi]
        test    rdx, rdx
        jle     .LBB0_2
        mov     rdi, qword ptr [rdi + 8]
        shl     rdx, 3
        xor     esi, esi
        call    memset
.LBB0_2:
        pop     rax
        ret
TestList::ClearNonOptimized():      # @TestList::ClearNonOptimized()
        cmp     qword ptr [rdi], 0
        jle     .LBB1_3
        mov     rax, qword ptr [rdi + 8]
        xor     ecx, ecx
.LBB1_2:                                # =>This Inner Loop Header: Depth=1
        mov     qword ptr [rax + 8*rcx], 0
        add     rcx, 1
        cmp     rcx, qword ptr [rdi]     <=== Is this due to memory aliasing?
        jl      .LBB1_2
.LBB1_3:
        ret

GCC:

TestList::ClearOptimized():
        mov     rdx, QWORD PTR [rdi]
        test    rdx, rdx
        jle     .L6
        mov     rdi, QWORD PTR [rdi+8]
        sal     rdx, 3
        xor     esi, esi
        jmp     memset
.L6:
        ret
TestList::ClearNonOptimized():
        cmp     QWORD PTR [rdi], 0
        jle     .L8
        mov     rdx, QWORD PTR [rdi+8]
        xor     eax, eax
.L10:
        mov     QWORD PTR [rdx+rax*8], 0
        add     rax, 1
        cmp     QWORD PTR [rdi], rax     <=== Is this due to memory aliasing?
        jg      .L10
.L8:
        ret

Am I reading this right? Does that mean it will fetch the information in the cache instead of using it from a register?

GKann
  • 52
  • 5
  • I don't think C++20 changes anything in this regard, other than formalizing what compilers have already been doing. Can you elaborate what's your question, for those who haven't watched the video? – HolyBlackCat Jul 04 '23 at 19:06
  • 1
    When GCC first introduced `-fno-strict-aliasing`, it would break even the simplest cases like `*(uint32_t*)my_float`. (And see also [gcc, strict-aliasing, and horror stories](https://stackoverflow.com/q/2958633)). GCC has since gone out of its way to be less hostile to bad code with simple "obvious" cases of strict-aliasing UB, often detecting bad type-pun idioms and handling them as the programmer intended. (But obviously code should use `memcpy`, or in C++20 `std::bit_cast` to get size checking and constexpr compat.) – Peter Cordes Jul 04 '23 at 20:06
  • I don't remember any case where we can't get just as efficient asm while avoiding strict-aliasing UB. Although some cases may require GNU C extensions like `__attribute__((may_alias))` (without `aligned(1)`) since targets without efficient unaligned loads may not inline `memcpy` which doesn't require alignment for its operands. But on most modern targets, GCC will inline `memcpy(p, q, sizeof(int))` to a single load or store if p or q is the address of a local temporary. (GNU C++ allows use of inactive union members the way C99 does, which may be handy sometimes.) – Peter Cordes Jul 04 '23 at 20:08
  • Anyway, IDK how to answer "how bad is it". If you mean how bad is it to violate the strict-aliasing rule, it's Undefined Behaviour; your code might totally break. Don't do it. If you mean how much performance it costs to get the compiler to do what you wanted via various workarounds, they *should* compile to the same asm you were hoping you could get from whatever aliasing violation you were contemplating. – Peter Cordes Jul 04 '23 at 20:13
  • Thanks @PeterCordes, I will elaborate on the question better and edit it later today. But the idea is to ask if modern compilers have become better at identifying cases where aliasing is not possible much like in type-based alias analysis (TBAA). The specific case I am looking for is when writing to class members, Ericson states that member variables are accessed through the 'this' pointer which leads to implicit aliasing... I will change the question to include code blocks. – GKann Jul 04 '23 at 23:15
  • Oh, like how bad is the possibility of aliasing when you don't or can't use `__restrict`. Yeah, that can often lead to instructions ahead of loops to check for overlap between pointers, before entering the vectorized version of a loop. (With a fallback to a scalar version). These is useless code if your arrays can't ever actually overlap. – Peter Cordes Jul 04 '23 at 23:18
  • I have added some example code to focus on, I tried to reason about the assembly but I don't have a full understanding of it tbh... Please let me know if there is anything else I can do to make the question more straightforward. Thanks again! – GKann Jul 05 '23 at 21:45

0 Answers0