8

I am testing combinations of various optimizations and for these I need a static-if as described in http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Static-If-I-Had-a-Hammer to enable and disable specific optimizations. if(const-expr) does not always work as some optimizations involve changing the data layout and this can not be done at function scope.

Basically what I want is this:

template<bool enable_optimization>
class Algo{
  struct Foo{
    int a;
    if(enable_optimization){
      int b;
    }

    void bar(){
      if(enable_optimization){
        b = 0;
      }
    }
  };
};

(Yes, the smaller memory footprint of removing b from the data layout is relevant in my case.)

Currently I am faking it using a very bad hack. I am looking for a better way of doing this.

File a.h

#ifndef A_H
#define A_H
template<class enable_optimization>
class Algo;
#include "b.h"
#endif

File b.h (this file is autogenerated from a Python script)

#define ENABLE_OPTIMIZATION 0
#include "c.h"
#undef
#define ENABLE_OPTIMIZATION 1
#include "c.h"
#undef

File c.h

template<>
class Algo<ENABLE_OPTIMIZATION>{
  struct Foo{
    int a;
    #if ENABLE_OPTIMIZATION
    int b;
    #endif

    void bar(){
      #if ENABLE_OPTIMIZATION
      b = 0;
      #endif
    }
  };
};

Does anyone know of a better way of doing this? Theoretically it can be done using template meta programming and at first I used it. At least the way I used it was a pain in the ass and lead to completely unreadable and bloated code. Using the hack above resulted in a significant productivity boost.

EDIT: I have several optimization flags and these interact.

Mechanical snail
  • 29,755
  • 14
  • 88
  • 113
B.S.
  • 1,435
  • 2
  • 12
  • 18
  • The preprocessor switch doesn't sound like such a bad hack IMO. – zneak Jul 21 '12 at 14:33
  • @zneak, it depends how much code is affected by the conditional compilation, if there are lots of chunks enabled/disabled by the preprocessor the implementation of `bar` could get much harder to read and understand. Separating it into optimised and non-optimised parts could be simpler, but it's hard to know as a minimal example doesn't show what the real code is like. – Jonathan Wakely Jul 21 '12 at 14:48
  • Will next c++ standard will have static_if ? – Mr.Anubis Jul 21 '12 at 16:19
  • @Mr.Anubis, noone knows, there are [some](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3322.pdf) [proposals](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3329.pdf) but nothing has been accepted or decided yet. – Jonathan Wakely Jul 21 '12 at 16:41
  • 1
    Note that `if constexpr` (formerly known as `static if`) may now very well be part of C++17. – P-Gn May 20 '16 at 12:18

3 Answers3

6

There's no reason the code should get much more complicated using templates:

template<bool enable_optimization>
  class FooOptimized
  {
  protected:
    int b;
    void bar_optim()
    {
      b = 0;
    }
  };

template<>
  class FooOptimized<false>
  {
  protected:
    void bar_optim() { }
  };

template<bool enable_optimization>
  struct Algo
  {
    struct Foo : FooOptimized<enable_optimization>
    {
      int a;

      void bar()
      {
        this->bar_optim();
      }
    };
  };

No metaprogramming needed, just separate the parts that vary based on whether optimisation is enabled into a new type and specialize it.

Because the new type is used as a base class when it is empty (i.e. there is no FooOptimized::b member) it will take up no space, so sizeof(Algo<false>::Foo) == sizeof(int).


(Feel free to ignore the rest of this answer, it doesn't address the question directly, instead it suggest a different approach that has different trade-offs. Whether it is "better" or not depends entirely on the details of the real code, which are not shown in the trivial example given in the question.)

As a related, but separate issue, the parts of Algo and Algo::Foo that don't depend on whether optimization is enabled are still dependent on the template parameter, so although you only write those bits of code once, the compiler will generate two sets of object code. Depending how much work is in that code and how it's used you might find there is an advantage to changing that into non-template code, i.e. replacing static polymorphism with dynamic polymorphism. For example, you could make the enable_optimization flag a runtime constructor argument instead of template argument:

struct FooImpl
{
  virtual void bar() { }
};

class FooOptimized : FooImpl
{
  int b;
  void bar()
  {
    b = 0;
  }
};

struct Algo
{
  class Foo
  {
    std::unique_ptr<FooImpl> impl;
  public:
    explicit
    Foo(bool optimize)
    : impl(optimize ? new FooOptimized : new FooImpl)
    { }

    int a;

    void bar()
    {
      impl->bar();
    }
  };
};

You'd have to profile and test it to determine whether the virtual function has more of less overhead than the duplication of code in Algo and Algo::Foo that doesn't depend on the template parameter.

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
  • I think you missed that he didn't want b to exist at all in the optimised version, not just the initialisation - he did say "smaller memory footprint" – gbjbaanb Jul 21 '12 at 15:05
  • @gbjbaanb, I think you missed that `b` exists in the OP's optimised version, not in the unoptimised version, and is the same in my version, isn't it? – Jonathan Wakely Jul 21 '12 at 15:56
  • But... how does adding a virtual pointer to each object (doubling, at least, the size of each class instance) achieves the smaller memory footprint goal ? If you would remove the runtime trick, then it would become a much better answer. – Matthieu M. Jul 21 '12 at 16:04
  • @MatthieuM. The first half of the answer addresses the OP's exact question, the second half shows an alternative that _might_ help - you can't know it doubles the size of the class when you don't know the size of the real code (I'm assuming the OP's real code doesn't set an int to zero, it's probably a bit more complicated than that!) My point in the second half of the answer is that if "optimisation" is the goal, duplicating code _might_ cause worse performance than using a virtual function, and _might_ increase the executable size more than adding a vtable. – Jonathan Wakely Jul 21 '12 at 16:36
  • @MatthieuM. ... and in case it's not obvious, there's a horizontal rule and the words "As a related, but separate issue" to make it clear the second half is separate from the first. – Jonathan Wakely Jul 21 '12 at 16:39
  • @JonathanWakely: Well, with the disclaimer it is certainly clearer. – Matthieu M. Jul 21 '12 at 17:04
  • As soon as you have several flags that interact, then this approach quickly becomes problematic in my experience. How do you realize something as #if USE_A && (USE_B || USE_C) && !USE_D in a way that someone else can read and that can easily be extended to #if USE_A && (USE_B || USE_C) && !USE_D && USE_E if a new flag USE_E is introduced? Actually there are flags that just remove and int from a struct (and do some other stuff). virtual functions are a nogo in my situation. – B.S. Jul 21 '12 at 21:17
  • 1
    Personally I don't find that sort of preprocessor condition easy to reason about, but splitting the algo up into orthogonal pieces that capture independent variations (as in Alexandrescu's _Modern C++ Design_) certainly isn't trivial either – Jonathan Wakely Jul 21 '12 at 21:32
1

Note: as it stands, this approach doesn't work, as there appears to be no way to have a member in a class without allocating space for that member. If you have an idea how to make it work, feel free to edit.

You could use an idiom like this:

template<bool optimized, typename T> struct dbg_obj {
  struct type {
    // dummy operations so your code will compile using this type instead of T
    template<typename U> type& operator=(const U&) { return *this; }
    operator T&() { return *static_cast<T*>(0); /* this should never be executed! */ }
  };
};
template<typename T> struct dbg_obj<false, T> {
  typedef T type;
};

template<bool enable_optimization>
class Algo{
  struct Foo{
    int a;
    typename dbg_obj<enable_optimization, int>::type b;

    void bar(){
      if(enable_optimization){
        b = 0;
      }
    }
  };
};

If optimization is disabled, this gives you a normal int member. If optimization is enabled, then the type of b is a struct without members, which will not take any space.

As your method bar uses what looks like a runtime if to decide whether or not to access b, as opposed to a clear compile-time mechanism like template specialization, all the operations you use on b must be available from the dummy struct as well. Even if the relevant sections will never get executed, and the compiler most likely optimizes them away, the correctness checks come first. So the line b = 0 must compile for the replacement type as well. That's the reason for the dummy assignment and the dummy cast operations. Although either would suffice for your code, I included both in case they prove useful at some other point, and to give you an idea how to add more should you ever require them.

MvG
  • 57,380
  • 22
  • 148
  • 276
  • 2
    An empty struct still takes up space _unless_ used as a base class. – Jonathan Wakely Jul 21 '12 at 16:03
  • @JonathanWakely, thanks for pointing that out, I hadn't known. I'm still looking for a loophole to salvage my approach, but as it stands, it is elegant but useless, as it doesn't fulfill the requirements. :-( – MvG Jul 21 '12 at 16:14
  • @MvG: **E** mpty **B** ase **O** ptimization. Unlike an attribute a base class is allowed not to inflate the size of an object if it is empty (and a few other conditions are met). It does not make the code as elegant though. – Matthieu M. Jul 21 '12 at 16:32
  • You can still use the EBO without making `Foo` derive from `dbg_obj`, see Listing 4 in [The "Empty Member" C++ Optimization](http://cantrip.org/emptyopt.html). You couldn't do that with `dbg_opt::type` if that's an `int` though, as you can't derive from `int`, so you'd need to change it a bit further. – Jonathan Wakely Jul 21 '12 at 16:56
0

The static solution presented by Jonathan Wakely (not the dynamic one) is the way to go.

The idea is simple:

  1. Isolate the "specific" data into its own class (templated and specialized accordingly)
  2. All accesses to that data is completely handed over to the specialized class (the interface must be uniform)

Then, in order not to incur space overhead, you use the EBO (Empty Base Optimization) at your advantage by inheriting either specialization of this special class.

Note: an attribute is required to take at least 1 byte of space, a base class is not in some conditions (such as being empty).

In your case:

  • b is the specific data
  • there is a single operation involving it (which consist in setting its value)

So we can easily construct a class:

template <typename T> struct FooDependent;

template <>
struct FooDependent<true> {
    int b;
    void set(int x) { b = x; }
};

template <>
struct FooDependent<false> {
    void set(int) {}
};

And then we can inject it into the Foo, using EBO at our advantage:

struct Foo: FooDependent<enable_optimization> {
    int a;

    void bar() { this->set(0); }
};

Note: use this in templated code to access members of base classes, or good compilers will reject your code.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • "_The static solution [...] (not the dynamic one) is the way to go._" -- how can you be sure without seeing the actual code and knowing the actual problem being solved? It might be, and that's why I gave that solution first, but in some situations it performs worse so I suggested an alternative that's worth considering. – Jonathan Wakely Jul 21 '12 at 16:50
  • @JonathanWakely: At some point I would say that you need to trust the OP to have explained her situation sufficiently. In **this** case it is doubling the size of the structure. In the real case, who knows ? It *will* add overhead, but there is no way to measure it so I would think it is best concentrating on the data we have. – Matthieu M. Jul 21 '12 at 17:03
  • Making all the code in `Algo` and `Algo::Foo` depend on the template parameter also adds overhead, it's a real cause of [template bloat](http://accu.org/index.php/conferences/accu_conference_2011/accu2011_sessions#Diet%20Templates) (as opposed to the mythical form that some people refer to.) – Jonathan Wakely Jul 21 '12 at 17:13
  • @JonathanWakely: yes and no. First, it's a separate issue; and dealing away with it seems premature: is it really causing pain ? Second, in optimized builds it might well be a concern only at compilation time: with all the inlining that goes on in templates and with the compiler/linker being able to merge together similar segments of code, the bloat will probably not make it into the final binary (thus why talking about it seems premature). As such, I would avoid confusing the OP here. A precise question was asked, technics to reduce bloat are a separate topic. – Matthieu M. Jul 21 '12 at 17:36
  • Only the Microsoft linker does such merging ("identical COMDAT folding"), for other toolchains templates create separate code for each instantiation, even if the generated code is identical – Jonathan Wakely Jul 21 '12 at 17:52
  • The code presented is a simplified version. Think of several optimization flags that interact with each other. Often it is indeed possible to think in isolation but there are sitatuation, where this is not possible and there I've had lots of trouble with this approach. – B.S. Jul 21 '12 at 21:21
  • @B.S.: Note that the condition itself may be as tricky as required. It is perfectly legal to instantiate `FooDependent` for example. However the solution does require splitting the parcels into logical and orthogonal entities, which is definitely not trivial, but arguably "good design". – Matthieu M. Jul 23 '12 at 17:01