C++: How to make the compiler optimize memory access in case when a pointer of a member variable is passed elsewhere

Question

[edit: Here is the motivation: passing a pointer of a variable to an external function may accidentally break some optimization for "adjacent" variables, because of the possibility to get pointers to the adjacent variables calculated from the original pointer by the external function. The following is the original post, in which the volatile is to simulate an external function inaccessible to the current compiler unit, e.g. virtual function call, closed source library function, etc.]

I wondered if the return t.a; in the following code would be optimized to return 0;.

//revision 1
struct T
{
    int a;
    int b;
};

void f_(int * p)
{
    *p = 1;
}
auto volatile f = f_;

int main()
{
    T t;
    t.a = 0;
    t.b = 0;
    for (int i = 0; i < 20; ++i)
    {
        f(&t.b);
    }
    return t.a;
}

Well it's not. Fair enough because the code in function f may use offsetof to acquire a pointer to t then change t.a. So it's not safe to optimize the load of t.a away.

[edit: At a second thought, offsetof is not enough here. We need container_of, which there seems no way to implement in standard C++.]

But offsetof cannot be used on non-standard-layout types. So I tried the following code:

//revision 2
#include <type_traits>

struct T
{
private:
    char dummy = 0;
public:
    int a;
    int b;
};
static_assert(!std::is_standard_layout_v<T>);

void f_(int * p)
{
    *p = 1;
}
auto volatile f = f_;

int main()
{
    T t;
    t.a = 0;
    t.b = 0;
    for (int i = 0; i < 20; ++i)
    {
        f(&t.b);
    }
    return t.a;
}

Unfortunately it's still not working.

My questions are:

whether it's safe to optimize the load of t.a away in the above case (revision 2)
if it's not, is there some arrangement in existence/proposal to make it possible? (e.g. making T a more special type, or some attribute specifier for member b in T)

P.S. The following code is optimized for return t.a;, but the yielded code for the loop is a bit inefficient. And still, the temporary variable juggling is cumbersome.

//revision 3
struct T
{
    int a;
    int b;
};

void f_(int * p)
{
    *p = 1;
}
auto volatile f = f_;

int main()
{
    T t;
    t.a = 0;
    t.b = 0;
    for (int i = 0; i < 20; ++i)
    {
        int b = t.b;
        f(&b);
        t.b = b;
    }
    return t.a;
}

I'm a bit confused. It seems like you try your best to _not_ get the code optimized yet wonder how to make the compiler optimize it. Perhaps it'd be better if you instead asked how to perform a specific task to not risk having this question placed in the [XY problem](https://en.wikipedia.org/wiki/XY_problem) category. — Ted Lyngmo, Oct 31 '20 at 12:07
Any particular reason you are using `volatile` as it's correct use in C++ is very restricted ? This feels like an XY-Problem. — Richard Critten, Oct 31 '20 at 12:07
By marking `f` as volatile, you've told the compiler that `f` may be changed by some code that is not visible to the compiler. In the first case, the compiler needs to allow for the possibility `f()` may change arbitrarily in the loop, and do different things - such as change `t.a` on any loop iteration. It would be quite feasible for `f` to change so `f()` does `*(p-1) = 42` as that is a perfectly well-defined way of modifying `t.a` if passed `&t.b` since `t.a` and `t.b` are in the same object (data struct). Similar discussion for your revisions. — Peter, Oct 31 '20 at 12:19

ecatmur · Answer 1 · 2020-11-04T14:53:33.420

The use of offsetof to reach T::a from T::b is illegitimate, since there is no object pointer-interconvertible with T::b from which T::a can be reached. In the other direction it is possible to reach T::b from T::a, since the latter is pointer-interconvertible with T. Contra Peter in comments (and despite the existence of container_of macro in e.g. Linux kernel), &t.b - 1 does not yield a pointer to t.a, since T::b and T::a are not pointer-interconvertible.

Note that given a pointer to T::a you would still need to use std::launder to access T::b:

auto p = &t.a;
std::launder(reinterpret_cast<T*>(p))->b = 1;

So a sufficiently aggressive compiler would indeed be able to conclude that no replacement f could access t.a given a pointer to t.b. However, it appears that no mainstream compiler performs this optimization at this time.

I think no compiler would actively do this kind of optimization because of the widely used `container_of` macro, be it conforming or not. Any (even potential) `__container_of` alike compiler extension would prohibit it. So we need something like `[[no_outer_cast]]` attribute to explicitly allow it. — zwhconst, Nov 04 '20 at 08:44

C++: How to make the compiler optimize memory access in case when a pointer of a member variable is passed elsewhere

1 Answers1