3

I have found on the internet (here and here), that the inheritance doesn't affect the performance of the class. I have become curious about that as I have been writing a matrices module for a render engine, and the speed of this module is very important for me.

After I have written:

  • Base: general matrix class
  • Derived from the base: square implementation
  • Derived from derived: 3-dim and 4-dim implementations of the square matrix

I decided to test them and faced performance issues with instantiation

And so the main questions are:

  1. What's the reason of these performance issues in my case and why may they happen in general?
  2. Should I forget about inheritance in such cases?

This is how these classes look like in general:

template <class t>
class Matrix
{
protected:
    union {
        struct
        {
            unsigned int w, h;
        };
        struct
        {
            unsigned int n, m;
        };
    };

    /** Changes flow of accessing `v` array members */
    bool transposed;

    /** Matrix values array */
    t* v;

public:
    ~Matrix() {
        delete[] v;
    };
    Matrix() : v{}, transposed(false) {};

    // Copy
    Matrix(const Matrix<t>& m) : w(m.w), h(m.h), transposed(m.transposed) {
        v = new t[m.w * m.h];
        for (unsigned i = 0; i < m.g_length(); i++)
           v[i] = m.g_v()[i];
    };

    // Constructor from array
    Matrix(unsigned _w, unsigned _h, t _v[], bool _transposed = false) : w(_w), h(_h), transposed(_transposed) {
       v = new t[_w * _h];
       for (unsigned i = 0; i < _w * _h; i++)
           v[i] = _v[i];
    };

    /** Gets matrix array */
    inline t* g_v() const { return v; }
    /** Gets matrix values array size */
    inline unsigned g_length() const { return w * h; }

    // Other constructors, operators, and methods.
}



template<class t>
class SquareMatrix : public Matrix<t> {
public:
    SquareMatrix() : Matrix<t>() {};
    SquareMatrix(const Matrix<t>& m) : Matrix<t>(m) {};

    SquareMatrix(unsigned _s, t _v[], bool _transpose) : Matrix<t>(_s, _s, _v, _transpose) {};
    // Others...
}

template<class t>
class Matrix4 : public SquareMatrix<t> {
public:
    Matrix4() : SquareMatrix<t>() {};
    Matrix4(const Matrix<t>& m) : SquareMatrix<t>(m) {}

    Matrix4(t _v[16], bool _transpose) : SquareMatrix<t>(4, _v, _transpose) {};
    // Others...
}


To conduct tests I used this

void test(std::ofstream& f, char delim, std::function<void(void)> callback) {
    auto t1 = std::chrono::high_resolution_clock::now();
    callback();
    auto t2 = std::chrono::high_resolution_clock::now();
    f << std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count() << delim;
    //std::cout << "test took " << std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count() << " microseconds\n";
}

Performance problems

With single class initialization, there're no problems - it goes under 5 microseconds almost every time for every class. But then I decided to scale up the number of initializations and their several troubles occurred

I ran every test 100 times, with arrays of length 500

1. Class initialization with the default constructor

Raw results

I just tested the initialization of arrays

The results were (avg time in microseconds):

  • Matrix 25.19
  • SquareMatrix 40.37 (37.60% loss)
  • Matrix4 58.06 (30.47% loss from SquareMatrix)

And here we can already see a huge difference

Here's the code

int main(int argc, char** argv)
{
    std::ofstream f("test.csv");
    f << "Matrix\t" << "SquareMatrix\t" << "Matrix4\n";

    for (int k = 0; k < 100; k++) {
        test(f, '\t', []() {
            Matrix<long double>* a = new Matrix<long double>[500];
            });

        test(f, '\t', []() {
            SquareMatrix<long double>* a = new SquareMatrix<long double>[500];
            });

        test(f, '\n', []() {
            Matrix4<long double>* a = new Matrix4<long double>[500];
            });
    }

    f.close();

    return 0;
}

2. Class initialization with default constructor and filling

Raw results

Tested the initialization of arrays of class instances and filling them after with custom matrices

The results (avg time in microseconds):

  • Matrix 402.8
  • SquareMatrix 475 (15.20% loss)
  • Matrix4 593.86 (20.01% loss from SquareMatrix)

Code

int main(int argc, char** argv)
{
    long double arr[16] = {
       1, 2, 3, 4,
       5, 6, 7, 8,
       9, 10, 11, 12,
       13, 14,15,16
    };

    std::ofstream f("test.csv");
    f << "Matrix\t" << "SquareMatrix\t" << "Matrix4\n";

    for (int k = 0; k < 100; k++) {
        test(f, '\t', [&arr]() {
            Matrix<long double>* a = new Matrix<long double>[500];
            for (int i = 0; i < 500; i++) 
                a[i] = Matrix<long double>(4, 4, arr);
            });

        test(f, '\t', [&arr]() {
            SquareMatrix<long double>* a = new SquareMatrix<long double>[500];
            for (int i = 0; i < 500; i++) 
                a[i] = SquareMatrix<long double>(4, arr);
            });

        test(f, '\n', [&arr]() {
            Matrix4<long double>* a = new Matrix4<long double>[500];
            for (int i = 0; i < 500; i++) 
                a[i] = Matrix4<long double>(arr);
            });
    }

    f.close();

    return 0;
}

3. Filling vector with class instances

Raw results

Pushed back custom matrices to vector

The results (avg time in microseconds):

  • Matrix 4498.1
  • SquareMatrix 4693.93 (4.17% loss)
  • Matrix4 4960.12 (5.37% loss from its SquareMatrix)

Code

int main(int argc, char** argv)
{
    long double arr[16] = {
       1, 2, 3, 4,
       5, 6, 7, 8,
       9, 10, 11, 12,
       13, 14,15,16
    };

    std::ofstream f("test.csv");
    f << "Matrix\t" << "SquareMatrix\t" << "Matrix4\n";

    for (int k = 0; k < 100; k++) {
        test(f, '\t', [&arr]() {
            std::vector<Matrix<long double>> a = std::vector<Matrix<long double>>();
            for (int i = 0; i < 500; i++)
                a.push_back(Matrix<long double>(4, 4, arr));
            });

        test(f, '\t', [&arr]() {
            std::vector<SquareMatrix<long double>> a = std::vector<SquareMatrix<long double>>();
            for (int i = 0; i < 500; i++)
                a.push_back(SquareMatrix<long double>(4, arr));
            });

        test(f, '\n', [&arr]() {
            std::vector<Matrix4<long double>> a = std::vector<Matrix4<long double>>();
            for (int i = 0; i < 500; i++)
                a.push_back(Matrix4<long double>(arr));
            });
    }

    f.close();

    return 0;
}

If you need all the source code, you can look here into matrix.h and matrix.cpp

nt4f04und
  • 182
  • 1
  • 2
  • 21
  • 4
    I assume you tested a release build *with* optimizations enabled. Not a unoptimized debug build. ? – Jesper Juhl Nov 16 '19 at 16:38
  • 4
    The `inline` keyword is completely redundant for functions implemented in the class definition. I suggest you don't use it, because it *does not* mean "inline this function". – walnut Nov 16 '19 at 16:42
  • I assume this is using GNU extensions to C++. – Eljay Nov 16 '19 at 16:43
  • I would probably make the derived classes `final`. – Jesper Juhl Nov 16 '19 at 16:44
  • 3
    Oh my... I forgot about this! Should I delete the question as in release all problems have gone? – nt4f04und Nov 16 '19 at 16:45
  • 1
    @nt4f04uNd You might want to do a proper benchmark (see e.g. [google benchmark](https://github.com/google/benchmark)). It is likely that your `callback` calls are all optimized away completely in release mode anyway, because you never use the results. – walnut Nov 16 '19 at 16:51
  • 3
    @nt4f04uNd I'd recommend leaving the question so that someone else looking for "why?" and forgetting about the release/debug build differences could do a facepalm as well :) – YePhIcK Nov 16 '19 at 17:02
  • @uneven_mark I will definitely check out some benchmark, but for now I just moved all code from `test` function and added some test usages by calling matrix methods from generated arrays, it still doesn't affect the performance and all timing differences seem to be just a fault – nt4f04und Nov 16 '19 at 17:25
  • Compared to what? You want to be clear about what exactly you're measuring. You need to measure your solution *against another solution* rather than just test inheritance *versus* non-inheritance. If you don't use inheritance you will have to use some other technique, so you need to measure *its* cost. As another example, measuring virtual *versus* non-virtual methods isn't valid unless you also accout pnt for whatever other tests you need to execute to reach the correct method at runtime. – user207421 Nov 17 '19 at 00:17
  • @user207421 compared to non inheritance as inheritance is a feature said not to change execution speed of the program. If after all the issue was still present, I would just use my base matrix class and I don't get you, when you talk that "I have to use some other technique". No, actually I do not. – nt4f04und Nov 17 '19 at 09:23

1 Answers1

4

Does inheritance really not affect performance?

Yes. Inheritance won't affect runtime performance as long as virtual method isn't involved. (Cuz only then will you have to deduce the type at runtime and call corresponding virtual method override). In fact, if you have a sight into lower details, you will know that c++ inheritance is mostly just static things, that is, done at compilation time.

What's the reason of these performance issues in my case and why may they happen in general?

It seems these work well when optimization is enabled?

Should I forget about inheritance in such cases?

The only thing you need to do in such performance-sensitive cases is to avoid virtual method.

Something not related to this question. I have read your code. Perhaps it will be better to implement your templates in header file?

kaleid_liner
  • 91
  • 1
  • 2
  • I like to separate implementation and declaration, this, as for me, looks a little more clear, because when they are both in header file, it looks like a mess, yet again for me – nt4f04und Nov 16 '19 at 17:42
  • As far as a non-template class/ function is concerned, you have to separate the implementation and declaration. But the common practice to deal with **template** definition is to place them in header file. Maybe you have known this. [why-can-templates-only-be-implemented-in-the-header-file](https://stackoverflow.com/questions/495021/why-can-templates-only-be-implemented-in-the-header-file). The *alternative solution* isn't a good practice., at least in this case. – kaleid_liner Nov 16 '19 at 18:02