How smart are C++ compilers about code size with template classes?

Question

Consider the following code:

template<typename T, size_t S> struct MyArray {
  int length;
  T contents_[S];

  MyArray() { length = S; }

  T& operator[](const int index) {
    assert(index >= 0 && index < length);
    return contents_[index];
  }

  int Length() { return length; }

};

Fundamentally, there is no reason to create separate copies of the Length functions and subscript operator for each value of S. But I'm concerned that in fact there will be duplication of these functions for each different value of S, limiting the usefulness of this approach.

(In case you are curious why I'm not using std::vector, it's because this is an embedded application without any heap based memory allocation)

There is [`std::array`](https://en.cppreference.com/w/cpp/container/array). — Jarod42, Jan 08 '21 at 00:49
Is `Length()` still equal to `S`? Is `length` member useful? — Jarod42, Jan 08 '21 at 00:50
Also, for the record, there *is* a reason to create separate copies of the subscript operator. `contents_[index]` is implemented differently depending on the type of `contents_` (the indexing arithmetic is different; it depends on the size of the underlying type. You cannot do all array indexing with one function implementation). — Alexander Guyer, Jan 08 '21 at 00:52
LTO (Link Time Optimization) is supposed to get rid of identical sections of code at the linking phase. That may help with common template functions like `Length()`. I guess you'd have to check the size of the binary. — Galik, Jan 08 '21 at 00:52
Also, `Length()` probably ends up with no code regardless of templates. The resulting bytecode almost certainly just accesses the `length` member directly. But for a more complex method your question would stand. — Mooing Duck, Jan 08 '21 at 00:53
@Galik: The `Length()` method is probably different enough for each instantiation that LTO can't merge them. The offsets in the class directly depend on `sizeof(T)` and `S` template parameters. — Mooing Duck, Jan 08 '21 at 00:54
Let the cmpiler optimize optimize the `Lenght` function away by making it `constexpr`, and change the name of `S` to `lenght`. — Pablochaches, Jan 08 '21 at 00:54
It's really up to the compiler and how it optimises. Compilers and linkers are quite capable of detecting duplicated code, and rationalising it. However, not all compilers are configured to do so by default. — Peter, Jan 08 '21 at 00:55
Playing around in [godbolt](https://godbolt.org/z/x5633Y) a bit, it seems like (1) at low optimization levels, no merging is done, and (2) at high optimization levels, everything is trivially inlined anyway. Assuming you're worried about space and have optimizations turned on, it seems like you have nothing to worry about. — Silvio Mayolo, Jan 08 '21 at 00:56
I arrived at the same conclusions that @SilvioMayolo, Op should think of a less trivial example. — xvan, Jan 08 '21 at 01:43
@Nerdizzle The question states that there is no reason to create separate copies of the subscript operator "for each value of S"; it does *not* say "for each value of S *and T*". The way I read that, the OP is expecting separate copies of the subscript operator for each value of `T`, but sees no reason to create separate copies for, for example, `MyArray` and `MyArray`. — JaMiT, Jan 08 '21 at 02:26
I expect separate copies based on T, but not based on S. I edited the code to put length before the data, because I could see that the offset of the length variable would depend on S. Other than that, I can't see why the implementation would vary based on S. That is part of the point of the length variable, BTW. Because without it, the implementation of the Length function and the operator would HAVE to vary based on S. Right? — Andrew Voelkel, Jan 08 '21 at 02:30
@Jarod42 It looks to me like the length member is there to enable the possibility of using the same `operator[]` to be used regardless of `S`. Instead of a comparison to the template parameter, that operator has a comparison to the `length` member stored in the object. In theory, if `T` is fixed, the same machine code could handle that operator regardless of which `S` is being dealt with (at least with the latest update, which fixes an easily-overcome oversight). — JaMiT, Jan 08 '21 at 02:30
@JaMiT Right! - given that I edited the code to put the length variable before the data. — Andrew Voelkel, Jan 08 '21 at 02:32
Also a comment on optimization. I try to write code that doesn't rely on optimization. It makes debug easier, and frankly, code execution speed does matter much in this application. But fancy code size optimizations often make debug a mess. But in this case, I see no reason why all versions of the class for a given T could not share common implementations for the subscript operator and the Length functions — Andrew Voelkel, Jan 08 '21 at 02:34
@AndrewVoelkel most linkers implement a variant of folding where identical functions are collapsed into just one copy when building the final executable. Sometimes this is only in release (optimized) mode. The VC++ linker calls this "COMDAT folding", gcc names it differently. See more here: https://stackoverflow.com/questions/15168924/gcc-clang-merging-functions-with-identical-instructions-comdat-folding Bottom line, yeah template programming would be infeasible due to code bloat without this feature. — Tumbleweed53, Jan 08 '21 at 02:44
@AndrewVoelkel In light of "*write code that doesn't rely on optimization*", the most defensive way would be to not leave the choice up to the compiler, and *force* it to use the same implementation. For example `struct MyArrayBase { int length; int Length() const { return length; } };` then `template struct MyArray : MyArrayBase { ... }`. — dxiv, Jan 08 '21 at 02:48
@AndrewVoelkel Traditionally, one turns off optimizations when debugging, since fancy optimizations often make debugging a mess. I would recommend expecting inlining of particularly simple functions, such as `Length`. If you accept that, you could force code sharing with a (non-member) helper function template (something like `template T& get_at(T * contents, int index, int length) { assert(index >= 0 && index < length); return contents[index]; }`) and have the `operator[]` member `return get_at(contents_, index, S);` (simple enough to expect inlining). — JaMiT, Jan 08 '21 at 02:48
Does this answer your question? [GCC(/Clang): Merging functions with identical instructions (COMDAT folding)](https://stackoverflow.com/questions/15168924/gcc-clang-merging-functions-with-identical-instructions-comdat-folding) — JaMiT, Jan 08 '21 at 02:57

score 0 · Answer 1 · answered Jan 08 '21 at 05:33

0

MSVC uses "identical comdat folding" (ICF). This takes any two methods (or functions) with the same implementstion ... and uses the same function for them.

The gold linker for gcc (and clang apparently) also can do this (with varying degrees of aggressiveness).

Other than that, you'll have to do it manually.

struct baseArray{
  int length=0;
  int Length() const { return length; }
};
template<class T>
struct buffBaseArray:baseArray{
  T& operator[](const int index) {
    assert(index >= 0 && index < this->length);
    return reinterpret_cast<T*>(this+1)[index];// todo// alignment and ub
  }
};
  
template<typename T, size_t S>
struct MyArray:buffBaseArray<T> {
  T contents_[S];
  MyArray() { length = S; }
 };

or something less UB.

answered Jan 08 '21 at 05:33

Yakk - Adam Nevraumont

262,606
27
330
524

What do you mean by _"and clang apparently"_? – Passer By Jan 08 '21 at 05:34
@passer I knew it as a gcc linker; I googled that it works with clang, but just now. – Yakk - Adam Nevraumont Jan 08 '21 at 06:24
Oh. I thought you perhaps meant lld. – Passer By Jan 08 '21 at 06:49

How smart are C++ compilers about code size with template classes?

1 Answers1