1

I have a project that makes extensive use (high frequency) of a limited set of key linear algebra operations such as matrix multiplication, matrix inverse, addition, etc. These operations are implemented by a handful of linear algebra libraries that I would like to benchmark without having to recompile the business logic code to accommodate the different mannerisms of these various libraries.

I'm interested in figuring out what is the smartest way of accommodating a wrapper class as an abstraction across all of these libraries in order to standardize these operations against the rest of my code. My current approach relies on the Curiously Recurring Template Pattern and the fact that C++11 gcc is smart enough to inline virtual functions under the right circumstances.

This is the wrapper interface that will be available to the business logic:

template <class T>
class ITensor {

    virtual void initZeros(uint32_t dim1, uint32_t dim2) = 0;
    virtual void initOnes(uint32_t dim1, uint32_t dim2) = 0;
    virtual void initRand(uint32_t dim1, uint32_t dim2) = 0;

    virtual T mult(T& t) = 0;
    virtual T add(T& t) = 0;
};

And here is an implementation of that interface using e.g. Armadillo

template <typename precision> 
class Tensor : public ITensor<Tensor<precision> >
{
  public:

    Tensor(){}
    Tensor(arma::Mat<precision> mat) : M(mat) { }
    ~Tensor(){}

    inline void initOnes(uint32_t dim1, uint32_t dim2) override final
        {  M = arma::ones<arma::Mat<precision> >(dim1,dim2); }
    inline void initZeros(uint32_t dim1, uint32_t dim2) override final
        { M = arma::zeros<arma::Mat<precision> >(dim1,dim2);}
    inline void initRand(uint32_t dim1, uint32_t dim2) override final 
        { M = arma::randu<arma::Mat<precision> >(dim1,dim2);}

    inline Tensor<precision> mult(Tensor<precision>& t1) override final
    {
        Tensor<precision> t(M * t1.M);
        return t;
    }

    inline Tensor<precision> add(Tensor<precision>& t1) override final
    {
        Tensor<precision> t( M + t1.M);
        return t;
    }

    arma::Mat<precision> M;
};

Questions:

  1. Does it make sense to use CRTP and inlining in this scenario?
  2. Can this be improved with respect to optimizing performance?

As pointed out in an answer, the use of polymorphism here is a bit odd due to the templating of the base class. Here is why I think this still makes sense:

You will notice the base class is named "Tensor" rather than something more specific like "ArmadilloTensor" (after all, the base class implements ITensor methods using Armadillo methods). I kept the name as is because according to my current design, the use of polymorphism is more due to a sense of formalism than anything else. The plan is for the project code to be aware of a class called Tensor that offers the functionality specified in ITensor. For each new library that I want to benchmark, I would just write a new "Tensor" class in a new compilation unit, package the compilation results into an .a archive, and when doing a benchmarking test, link the business logic code against that library. Switching between different implementations then becomes a matter of choosing which Tensor implementation to link against. To the base code it is all the same whether the Tensor methods are implemented by Armadillo or something else. Advantages: avoids having code that knows about every library (they are all independent), and no compile time changes are required in the base code in order to use a new implementation. So, why the polymorphism? In my mind I just wanted to somehow formalize the functions that need to be implemented by any new library that is added to the benchmark. In reality, the base code would then work with ITensors in the function parameters, but then potentially static_cast them down to Tensors in the method bodies themselves.

nescience
  • 33
  • 4

1 Answers1

1

It's possible I'm missing something here, or you haven't shown enough details.

You use polymorphism. As defined in its name, it's about same type taking different shapes (different behaviour). So you have an interface that is accepted by user code and you can provide different implementations of that interface.

But in your case you don't have different implementations of a single interface. Your ITensor template generates different classes and each final implementation of your Tensor derives from a distinct base.

Consider your user code is something like this:

template<typename T>
void useTensor(ITensor<T>& tensor);

and you can provide your Tensor implementation. It's almost the same as

template<typename T>
void useTensor(T& tensor);

just w/o CRTP and virtual calls. Now each wrapper should implement some set of functionality. There's a problem that this set of functionality is not explicitly defined. Compiler provides a great help here but it's not ideal. It's why we all look forward to get Concepts in the next standard.

Andriy Tylychko
  • 15,967
  • 6
  • 64
  • 112
  • Yes you are right, there is a bit of missing info here - see the note on the edited post. Thanks for pointing me towards Concepts, interesting stuff. But putting the use of polymorphism aside (i.e. even if we got rid of ITensors altogether), can you think of any performance implications of the present design for what I want to use the code for? – nescience Nov 05 '17 at 16:25
  • I take your word that GCC inlined your virtual function calls in your specific case, but this doesn't mean it will do the same in all cases. Please check this: http://www.cs.technion.ac.il/users/yechiel/c++-faq/inline-virtuals.html. So, assuming you cannot be sure they will be inlined, you can have performance implications, which means not just inability to inline but also virtual call cost. But it's a lesser problem in my opinion, the bigger one is needless complications of CRTP (that doesn't have a very good reputation in general). Plus complications for maintenance. – Andriy Tylychko Nov 05 '17 at 21:56
  • Thanks, I am basing the assumption on this: https://stackoverflow.com/questions/733737/are-inline-virtual-functions-really-a-non-sense. In any case the easiest thing to do is to run a benchmark and compare runtime performance against the "native" Armadillo interface. Will post results! – nescience Nov 06 '17 at 08:13