optimization for ODR-used empty classes

Question

Many of nowaday C++ code tend to be template-loaded in greatest extent. They are libraries: STL, Boost.Spirit, Boost.MPL, etc among many other. They encourages users to declare functional objects in form struct S { /* presence of non-virtual member functions and operators, but absense of non-static data members or non-empty base classes */ }; S const s{};. Most of them is stateless (i.e. static_assert(std::is_empty< S >{}); holds). For those of them, which is ODR-used, regardless of theirs emptyness data section of file growth by 1 byte (sizeof(S) == 1 for empty type S because all addresses for consequentially allocated objects should be different). Even in simple grammars of Boost.Spirit there is a plenty of such ODR-used empty classes. But it is absolutely of no sense to keep space for them.

I tried to test clang on coliru using following code (-Ofast):

#include <utility>
#include <type_traits>

#include <cstdlib>
#include <cassert>

template< std::size_t index >
struct S {};

namespace
{

template< std::size_t index >
S< index > value = {};

}

template< typename lhs, typename rhs >
std::ptrdiff_t
diff(lhs & l, rhs & r)
{
    return (static_cast< char * >(static_cast< void * >(&r)) - static_cast< char * >(static_cast< void * >(&l)));
}

template< std::size_t base, std::size_t ...indices >
std::ptrdiff_t
bss_check(std::index_sequence< indices... >)
{
    return (diff(value< (base + indices) >, value< (base + indices + 1) >) + ...); 
}

template< std::size_t size, std::size_t base >
bool
enumerate()
{
    return (bss_check< base >(std::make_index_sequence< size >{}) + 1 == size);
}

template< std::size_t size, std::size_t ...bases >
bool
expand(std::index_sequence< bases... >)
{
    return (enumerate< size, (bases * size) >() && ...);
}

template< std::size_t size = 100, std::size_t count = size >
bool
check()
{
    return expand< size >(std::make_index_sequence< count >{});
}

int
main()
{
    static_assert(std::is_empty< S< 0 > >{});
    assert((check< DIM >()));
    return EXIT_SUCCESS;
}

and get the result (output of size utility for DIM == 100, i.e. 100 * 100 classes):

   text    data     bss     dec     hex filename
 112724   10612       4  123340   1e1cc ./a.out

If I change signature of diff(lhs & l, rhs & r) to diff(lhs l, rhs r) in order to suppress ODR-using, then result is:

  text     data     bss     dec     hex filename
  69140     608       8   69756   1107c ./a.out

Is almost equal to (data section is only of interest) the case of simple commenting of assert((check< DIM >())); line (major part of text section is predictable DCE-optimized out):

   text    data     bss     dec     hex filename
   1451     600       8    2059     80b ./a.out

Hence I conclude that there is no optimization for ODR-used empty classes.

For explicitly specified template parameters there is possibility to use a simple typefilter:

template< typename type >
using ref_or_value = std::conditional_t< std::is_empty< std::decay_t< type > >{}, std::decay_t< type >, type && >;

But there is no simple workaround for deduced template types at my mind.

Is there implied above optimization in modern compilers? If yes, how to enable it? If no, is there a technique to achieve desired behaviour at the moment?

I know somtimes addresses of objects mutters much, but it is not the case in described above situation.

I think something like attribute for variable or type (e.g. [[immaterial]]) would be handy. Maybe such an attribute (used for classes) should deny the possibility to get an address of instances of attributed classes (compile-time hard error) or address-of operator & shoud return non-sense value (implementation-defined).

Try to create such objects as local, perhaps temporary variables. As they are stateless, it shouldn't introduce any runtime cost. — Neil Kirk, Sep 25 '15 at 17:52
@NeilKirk Anyways I tried to initiate a discussion of general problem, not my local one. — Tomilov Anatoliy, Sep 25 '15 at 17:58
This is a duplicate of several discussions, `//stackoverflow.com/questions/3849334/sizeof-empty-structure-is-0-in-c-and-1-in-c-why`, and `//stackoverflow.com/questions/2362097/why-is-the-size-of-an-empty-class-in-c-not-zero` and others, most of which are closed because it comes up a lot. Basically, C doesn't allow empty structures, and in C++, which does, would imply for `T a[10];` that `sizeof a / sizeof * a` translates to division by zero (among many other related implications). Deriving from empty classes imposes no such penalty. It's in the standard, not an optimization option — JVene, Sep 25 '15 at 18:44
@JVene It is abolutely different question. Try at least to read the title. Don't you totally understand what I am talking about here at all? — Tomilov Anatoliy, Sep 25 '15 at 19:06
I don't think you an avoid this without changing the way your objects are declared or using a non-standard optimization. — Neil Kirk, Sep 25 '15 at 19:16
It's not just ODR-use; it's you actually inspecting their address. That means that to adhere to the standard's required behavior they must all get distinct addresses. — T.C., Sep 25 '15 at 19:34
@Orient, no need to be rude. "But it is absolutely of no sense to keep space for them." According to the standard, there is. — JVene, Sep 25 '15 at 20:28
This optimization would violate the as-if rule, in some way similar to this question [Do distinct functions have distinct addresses?](http://stackoverflow.com/q/26533740/1708801) in which MSVC will constant fold identical functions even though their address is taken. This is non-conforming and non-conformance can be problematic for portable coding, for example applying constexpr to functions not so specified by the standard could result in [observable SFINAE differences](http://stackoverflow.com/q/27744079/1708801). — Shafik Yaghmour, Sep 25 '15 at 20:41
@ShafikYaghmour I don't quite see how the file size is considered observable behaviour. As far as I understand the OP, the address is only used for the computation inside `diff`, and I don't see any reason why a compiler couldn't compute that at compile or link time, and remove all actual memory for the objects + the type information. `diff` is invoking UB anyway, since it's computing the difference between to pointers which don't point into the same array. — dyp, Sep 25 '15 at 21:36

score 1 · Accepted Answer · answered Sep 29 '15 at 22:05

C++17 will be adding inline variables to help address some of these issues as explained in N4424. It also explains some workarounds. For global function objects you can define them like this:

// Sum function object
struct sum_f
{
    template<class T, class U>
    auto operator()(T x, U y) const
    {
        return x+y;
    }  
};

template<class T>
struct static_const_storage
{
    static constexpr T value = T();
};

template<class T>
constexpr T static_const_storage<T>::value;


template<class T>
constexpr const T& static_const()
{
    return static_const_storage<T>::value;
}

static constexpr auto& sum = static_const<sum_f>();

This makes the sum function object unique across translation units thereby avoiding bloat and ODR violations. However, this workaround doesn't really work for template variables, and its best to avoid them(if you are concerned about executable bloat) until we get inline variables in C++17.

optimization for ODR-used empty classes

1 Answers1