1

I have a program that basically calculate the rolling-average and other rolling-statistics of a very large vector in C++. For a vector with size N, I would like to get another vector of size N with element k containing rolling statistics from 0 to k. I will just use rolling average for illustration below.

The program has an object with a function that update the rolling statistics based on the new value and previous statistics value. For example:

inline float update(const float x) {
    _n++;
    _mean += (x - _mean)/_n;
    return _mean;
}

Now I would like to support other statistics calculation, such as rolling standard deviation, and the plan is to add a pure virtual class as the base class with a pure virtual step function, and create different child classes corresponding to different statistics types.

My question here is: because the step function will be the main part for the function and will be executed billions of times, shall I expect the slow down by switch from an inline function to a virtual function significant? Is there other more efficient alternative for what I want to do? Or maybe I am not using the correct design pattern and there are other better ways to do it?

DiveIntoML
  • 2,347
  • 2
  • 20
  • 36
  • You are not always guaranteed to get inlined code, it is up to the compiler. There will be an overhead because of the vtable if you plan to prefer polymorphism. – macroland Aug 20 '18 at 02:33
  • A virtual single-step updater like this is not going to win any benchmarks. Making the update loop a part of your virtual function can be reasonable though. – n. m. could be an AI Aug 20 '18 at 05:07
  • Virtual functions in general are a lot less popular and important than they were back when though. Try std::accumulate with a lambda binary op. – n. m. could be an AI Aug 20 '18 at 05:12

2 Answers2

3

From what I understand, you might use template, something similar to

template <typename Range, typename F>
auto f(const Range& r, F&& f)
{
    std::vector<std::decay_t<decltype(f(*r.begin()))>> result;

    result.reserve(r.size());
    for (const auto& e : v) {
        result.push_back(f(e));
    }
    return result;
}


class MeanCalculator
{
public:
    float operator()(float x) const {
        n++;
        mean += (x - mean) / n;
        return mean;
    }
private:
    std::size_t n = 0;
    float mean = 0.0f;
};

class VarianceCalculator
{
public:
    float operator()(float x) const {
        n++;
        // ...
    }
private:
    std::size_t n = 0;
    //...
};

And then

std::vector<float> numbers = /*...*/;
auto means = f(numbers, MeanCalculator{});
auto variances = f(numbers, VarianceCalculator{});

Note: std::transform cannot be used as it doesn't guaranty in-order application of f.

Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • thanks, this is exactly what I need. Can I say that as long as I can know the types ahead of time, I can always replace virtual function with template? – DiveIntoML Aug 21 '18 at 01:01
  • As long as you don't need runtime dispatching, but compile-time (so with known types) one, yes. – Jarod42 Aug 21 '18 at 11:53
  • Because each of my class in this polymorphism has a lot of methods to implement, it is kind of difficult to ensure I do not forget any of the method for each class. Do you think it is better to make a virtual parent class as the interface, and let each implementation be a child of this virtual class, but never use runtime polymorphism? – DiveIntoML Aug 29 '18 at 19:36
  • You might be interested by [does-static-polymorphism-make-sense-for-implementing-an-interface](https://stackoverflow.com/questions/20771210/does-static-polymorphism-make-sense-for-implementing-an-interface/20772541#20772541). – Jarod42 Aug 29 '18 at 19:40
1

Using polymorphism is usually a very good strategy. It is a proven and time tested method to build correct and maintainable software. The vtable issue is rarely a factor in performance. Some interesting articles here and here regarding the heavy optimization of the vtable.

There are some other good ideas as well. Check out the Strategy Pattern for example. You can use it to select your function of choice when the handler object is created, possibly via template or as an argument to the ctor. Since the strategy is selected once at construction time, no vtable penalty is incurred.

Another related idea is to use lambda functions combined with some of the std:: algorithms. This pattern has become quite common since C++11. The lambda is used in place of polymorphism. Here is a trivial example that may give you some ideas for your project.

#include <iostream>
#include <vector>
#include <algorithm>

int main(int argc, char*argv[])
{
   std::vector<size_t> input = {2,4,6};
   std::vector<size_t> output;

   auto const operation = std::stoi(argv[1]);

   // transform the input vector to the output vector with algorithm selection
   switch(operation)
   {
     // just copy it
     case 1:  
     {
       std::for_each(input.begin() ,input.end(), [&](size_t element){
          output.push_back(element);
       });
       break;
     }

     // divide it by 2
     case 2:  
     {
       std::for_each(input.begin() ,input.end(), [&](size_t element){
          output.push_back(element/2);
       });
       break;
     }
   }

   for(auto const & element:output)
     std::cout << element << std::endl;
}
Matthew Fisher
  • 2,258
  • 2
  • 14
  • 23