Efficient vector operators in C++ / References to temporary objects

Question

I am trying to write a C++ vector class that stores an array of data and allows performing mathematical operations on an element-by-element basis. I want to implement this in such a way that an expression a = b + c + d should loop over all elements only once and directly write the sum b[i] + c[i] + d[i] to a[i] without creating intermediate vectors.

I was writing something like this:

template<class T, int N>
class VectorExpression {
  public:
    virtual T operator[] (int i) const = 0;

    virtual ~VectorExpression() {}
}

template<class T, int N>
class MyVector : public VectorExpression<T, N> {
    T data[N];

  public:
    T& operator[] (int i) { return data[i]; }
    T& const operator[] (int i) const { return data[i]; }

    MyVector<T,N>& operator=(const VectorExpression<T,N> &rhs) {
      for (int i = 0; i < N; ++i)
        data[i] = rhs[i];

      return *this;
    }
}

template<class T, int N>
class VectorSum : public VectorExpression<T, N> {
    VectorExpression<T,N> &a, &b;

  public:
    VectorSum(VectorExpression<T,N> &aa, VectorExpression<T,N> &bb)
    : a(aa), b(bb) {}

    T operator[] (int i) const { return a[i] + b[i]; }
}

template<class T, int N>
VectorSum<T,N> operator+(const VectorExpression<T,N> &a, 
        const VectorExpression<T,N> &b) 
{
  return VectorSum<T,N>(a, b);
}

int main() {
  MyVector<double,10> a, b, c, d;

  // Initialize b, c, d here

  a = b + c + d;

  return 0;
}

Probably this functionality is provided by the valarray class but that's because I tried to strip it down to a minimal example.

I made operator[] virtual because this allows nesting all kinds of expressions (e.g. a = !(-b*c + d)) provided I would define all the operators and the corresponding classes similar to VectorSum.

I use references because ordinary variables aren't polymorphic and pointers don't work with operator overloading.

Now my questions about this are:

In the statement a = b + c + d;, two temporary VectorSum<double,10> objects will be created to store b + c and (b+c) + d respectively. Will they live long enough to make the polymorphic behavior work? More specifically, (b+c) + d will store a reference to b + c, but will that object still exist when operator= is called? According to this post all temporaries should exist until operator= returns, but does this also hold for older versions of C++?
If not, then how is this done? The only alternative I see would be to allocate the VectorSum objects using new, return them by reference and then delete them in the operator= functions, but that seems a little cumbersome, and probably a lot less efficient. I'm also not sure if it is always safe.
(Minor question) Is it okay to override the return type T of VectorExpression::operator[] by T& const in MyVector?

EDIT

I had wrong argument types in operator+: changed them from VectorSum to VectorExpression.

This does not answer your questions and I did not take a deep look into your code but I am not sure this is the way to achieve efficient late evaluation of your sum. Have you considered using [expression templates](https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template) ? — coincoin, Jul 06 '15 at 08:07
As far as I understand the site you refer to is doing more or less the same as I am except for using templates instead of inheritance. I hadn't thought of that, so thanks. I'm not very sure which option is better though; their method doesn't need virtual classes but it does require a lot more classes (esp. if you count them after template instantiation) and temporary objects of all of these classes have to be instantiated. — PieterNuyts, Jul 06 '15 at 09:55
This might be of interest for you: http://stackoverflow.com/q/11809052/1116364 — Daniel Jour, Jul 06 '15 at 11:42
Thanks! The code gets quite complicated this way but I see now that with expression templates you can completely eliminate all temporaries and function calls at runtime, which is not possible when using inheritance and virtual functions. This probably makes it a better solution than mine in most cases. — PieterNuyts, Jul 09 '15 at 07:49

Richard · Answer 1 · 2015-07-09T22:22:55.440

Well here's what I came up with:

#include <iostream>
#include <initializer_list>
#include <algorithm>


template<class T, int N>
class VectorExpression {
public:
    virtual T operator[] (int i) = 0;
    virtual const T operator[] (int i) const = 0;

    virtual ~VectorExpression() {}
};


template<class T, int N>
class MyVector : public VectorExpression<T, N> {
  T data[N];

public:
  MyVector() {
    // initialize zero
    std::fill(std::begin(data), std::end(data), T());
  }

  MyVector(const std::initializer_list<T>& values) {
    // initialize from array initializer_list
    std::copy(std::begin(values), std::end(values), data);
  }

  MyVector(const VectorExpression<T,N>& rhs) { 
    for (int i = 0; i < N; ++i)
      data[i] = rhs[i];
  }

  MyVector<T,N>& operator=(const VectorExpression<T,N>& rhs) {
    for (int i = 0; i < N; ++i)
      data[i] = rhs[i];

    return *this;
  }

  T operator[] (int i) { return data[i]; }
  const T operator[] (int i) const { return data[i]; }

  friend std::ostream& operator<<(std::ostream& stream, MyVector& obj) { 
    stream << "[";
    for (int i = 0; i < N; ++i) { 
      stream << obj.data[i] << ", ";
    }
    stream << "]";

    return stream;
  }
};

template<class T, int N>
class VectorSum : public VectorExpression<T, N> {
  const MyVector<T,N> &a, &b;

public:
  VectorSum(const MyVector<T,N>& aa, const MyVector<T,N>& bb):
    a(aa), b(bb) {
  }

  T operator[] (int i) { return return a[i] + b[i]; }

  const T operator[] (int i) const { return a[i] + b[i]; }
};


template<class T, int N>
MyVector<T,N> operator+(const MyVector<T,N>& a, const MyVector<T,N>& b) {
  return VectorSum<T,N>(a, b);
}


int main() {
  MyVector<double,3> a, b({1,2,3}), c({3,4,5}), d({4,5,6});

  a = b + c + d; 

  std::cout << b << std::endl;
  std::cout << c << std::endl;
  std::cout << d << std::endl;
  std::cout << "Result:\n" << a << std::endl;

  return 0;
}

Output:

[1, 2, 3, ]
[3, 4, 5, ]
[4, 5, 6, ]
Result:
[8, 11, 14, ]

I've added an initializer_list (C++11) constructor and ostream operators purely for convenience/illustration purposes.

Since you've defined the operator[] as return by value, I was unable to set items in the data array for testing (since error: lvalue required as left operand of assignment); typically this operator should be by reference - but then, in your case, VectorSum::operator[] wouldn't work because that would fail compilation because of returning a reference to a temporary.

I also added a copy constructor because ...

// this calls MyVector's copy constructor when assigned to 'main::a'
template<class T, int N>
MyVector<T,N> operator+(const MyVector<T,N>& a, const MyVector<T,N>& b) {
  return VectorSum<T,N>(a, b);   // implicit MyVector::copy constructor 
}

// this also calls MyVector's copy constructor (unless the copy constructor is defined explicit)
template<class T, int N>
MyVector<T,N> operator+(const MyVector<T,N>& a, const MyVector<T,N>& b) {
  MyVector<T,N> res = VectorSum<T,N>(a, b);
  return res;
}

// but this would call MyVector's assignment operator
template<class T, int N>
MyVector<T,N> operator+(const MyVector<T,N>& a, const MyVector<T,N>& b) {
  MyVector<T,N> res;
  res = VectorSum<T,N>(a, b);
  return res;
}

In answer to your questions:

Yes - how would it behave if you explicitly defined the variable and returned that? It's the same behaviour for temporaries, except there is no variable declaration;
n/a
I touched on this above - you can't use reference because of 'returning reference to temporary' error; However there's no reason why you can add T& operator[] to MyVector( ie. not overriding).

EDIT: answers to comments:

The function specification must be identical when overriding including return type. Since you have defined it return by value in VectorExpression it must be return by value in MyVector. If you try to change it to reference in the child class you will get a compile error: conflicting return type specified. So no you can't override operator const with a version that returns const T& instead of T. Furthermore it must return by value since MyVectorSum returns { a[i] + b[i] } which would be a temporary and you can't return a reference to a temporary.
Sorry my mistake, fixed above.
because:
- MyVector isn't a subtype of VectorSum - compile error ‘MyVector’ is not derived from ‘const VectorSum’
- I've also tried with VectorExpression but compile error: 'cannot allocate an object of abstract type' - because it's trying to return by value
- I chose MyVector since that's the type of your expected result. Yes it does those all those for loops but I can't see a way round that : there's three different array 'data' variables each of which need to be iterated into order to be accumulated. At some point in the code you will have to do the for loops.
Understood, yes I got confused. removed from post.

1) I did not define `MyVector::operator[]` as return-by-value, so in my example you can put elements in the vector. I only did this in `VectorExpression` and `VectorSum` because you can't assign to an expression in general. They only have `operator[](int) const`; in `MyVector` I added a non-overriding `operator[](int)`. My question was whether it's okay to override `operator[](int) const` with a version that returns `const T&` instead of `T`. — PieterNuyts, Jul 09 '15 at 07:12
2) I don't see the point of your addition `T operator[] (int i) { return 0; }`: Now if you VectorExpression object happens to be non-const it will wrongly return 0. Since you're not returning by reference anyway I don't see why you wouldn't omit this method? — PieterNuyts, Jul 09 '15 at 07:13
3) I wonder why you changed the return type of `operator+` to `MyVector`. I could be mistaking but it seems this destroys the whole point of having an efficient operator that evaluates in one for-loop? Now you would evaluate `b + c` and store it in a `MyVector`, then create another `MyVector` to store the sum of this object and `d`, and then copy that one into `a`, so we have three for-loops now... — PieterNuyts, Jul 09 '15 at 07:16
4) A member function template is a member function that is itself a template, i.e. that has template parameters other than those of the class. These functions are still templates when the template class is instantiated and therefore they cannot be virtual. This is not the same as member functions of a template class, which become regular member functions when the template class is instantiated. I don't see any reason why these couldn't be virtual. — PieterNuyts, Jul 09 '15 at 07:18
5) @Richard: Reply to your answer 3: My argument types were wrong, indeed having `VectorSum` there makes no sense; it should have been `VectorExpression` (changed it now). But I still don't see why you changed the return type. `VectorSum` isn't abstract so there should not be any problem here? You don't have to do the for **loops** (plural) anywhere, just one for loop should be sufficient, and that's what I'm trying to achieve. — PieterNuyts, Aug 21 '15 at 11:44

PieterNuyts · Answer 2 · 2022-07-06T07:03:45.263

I didn't think of this at first but having a virtual operator[] method probably kills the efficiency I was trying to achieve by avoiding the 3 for-loops and intermediate storage of vector-sized temporaries. Making a method virtual prevents it from being inlined, which means it needs to be actually called as a function everytime an element is accessed.

Based on links I got from people who commented to my question and from Google, I ended up with the following solution which avoids needing any virtual methods.

template<class T, int N, class V>
class VectorExpressionBase {
    V ref;

protected:
    explicit VectorExpressionBase(const V *ref) 
    : ref(const_cast<V*>(ref)) {}

public:
    T  operator[] (int i) const { return ref[i]; }
    T& operator[] (int i)       { return ref[i]; }
};

template<class T, int N>
class VectorExpressionBase<T,N,void> {
    T data[N];

protected:
    explicit VectorExpressionBase(const void*) {
        // Argument is unused but accepted to have uniform
        // calling syntax
    }

public:
    T  operator[] (int i) const { return data[i]; }
    T& operator[] (int i)       { return data[i]; }
};

template<class T, int N, class V>
class VectorExpression : public VectorExpressionBase<T,N,V> {
public:
    template<class V1>
    VectorExpression<T,N,V>& operator= (
        const VectorExpression<T,N,V1> &rhs) 
    {
        for (int i = 0; i < N; ++i)
            data[i] = rhs[i];

        return *this;
    }

    explicit VectorExpression(const V *ref = 0) 
    : VectorExpressionBase<T,N,V>(ref) {}

    // Can define all kinds of operators and functions here such as
    // +=, *=, unary + and -, max(), min(), sin(), cos(), ...
    // They would automatically apply to MyVector objects and to the
    // results of other operators and functions
};

template<class T, int N>
class MyVector : public VectorExpression<T,N,void> {
};

template<class T, int N, class VA, class VB>
class VectorSum {
    VectorExpression<T,N,VA> &a;
    VectorExpression<T,N,VB> &b;

  public:
    VectorSum(VectorExpression<T,N,VA> &aa, VectorExpression<T,N,VB> &bb)
    : a(aa), b(bb) {}

    T operator[] (int i) const { return a[i] + b[i]; }
};

template<class T, int N, class VA, class VB>
VectorExpression<T,N,VectorSum<T,N,VA,VB> >
operator+(const VectorExpression<T,N,VA> &a, 
          const VectorExpression<T,N,VB> &b) 
{
    VectorSum<T,N,VA,VB> sum(a, b);
    return VectorExpression<T,N,VectorSum<T,N,VA,VB> >(sum);
}

Class VectorExpression now just wraps the class that does the work (in this case VectorSum). This allows defining all kinds of functions and operators for VectorExpression only, rather than having to overload them for VectorSum, VectorProduct, etc.

MyVector derives from a special case of VectorExpression which has a specialized base class; this isn't really necessary but it's nice because it makes all functions and operators defined for VectorExpression also available for MyVector. By using a simple base class VectorExpressionBase that only deals with storage and the [] operator, all other operators and methods don't need to be duplicated in the specialization for V = void.

Users would only need to know about classes MyVector<T,N> (for storing data) and possibly about VectorExpression<T,N,V> if they want to define additional functions and operators. VectorExpressionBase and the classes like VectorSum don't need to be visible to the outside world.

I find my original solution somewhat cleaner conceptually because the meaning of each class is more clear and because it doesn't require the template parameter V, but this one is more efficient because it doesn't require any virtual functions, which can probably make a large difference in certain cases.

Thanks for pointing me to the right links!

P.S. Surely most / all of this isn't new but I thought it would be nice to summarize and explain it a bit. I hope it can help others.

EDIT

I changed the type of data member VectorExpressionBase<T,N,V>::ref from V& to V. This is needed since the temporary V object the reference was pointing at may no longer exist at the time the VectorExpression is evaluated. For example, the temporary VectorSum object stops existing when the operator+ function returns, making the returned VectorExpression object useless.

I also completed the code with some constructors and corrected the operator+ function.

Chances are your virtual functions are [devirtualized](http://stackoverflow.com/questions/7046739/lto-devirtualization-and-virtual-tables) by the compiler and are not actually a performance issue. — nwp, Sep 07 '15 at 10:12
But I assume every `MyVector` object would still contain a vtable pointer, since there's no way of guaranteeing that *all* functions can *always* be devirtualized? For programs that use many small vectors this could be a serious memory overhead. Or am I missing something here? — PieterNuyts, Sep 08 '15 at 13:37
I would ignore that and come back to it once you run out of memory (which will never happen). If you need a solution now I would remove the inheritance, add a function `VectorExpression MyVector::get_expression()` and implement the algebraic operators using `get_expression`. You will need a decently sized `N` to see performance improvements though and chances are that an intermediate `MyVector` is faster for low `N`s. — nwp, Sep 08 '15 at 13:50
These are good points indeed: if `N` is low the overhead of an intermediate `MyVector` is low to zero, and if `N` is large the overhead of the vtable pointer is negligible, so it might be a somewhat academic problem. I considered your solution but finally came up with the above one, which doesn't require wrapping every function in `VectorExpression` with one in `MyVector`. The cost is that it becomes more complicated and may compile more slowly due to all the templates and intermediate classes. — PieterNuyts, Sep 09 '15 at 09:48
@nwp: About your comment on running out of memory: I'm in the hardware design industry (actually using SystemC which is built on C++). In our business usually the rule is that as long as you're not running out of either memory or CPUs, there's no excuse for not running additional simulations before actually producing your chip :-) — PieterNuyts, Sep 11 '15 at 09:50

Efficient vector operators in C++ / References to temporary objects

2 Answers2