Best practice on arithmetic operator overloading of vector on concurrent programs

Question

Problem context

I am currently analyzing and improving the performance of a computation-intensive, concurrent application written in C++ with the OpenMP parallel programming model. I have seen with profiling tools that a specific parallel region of the code drops the frequency to 200MHz, meaning that a very high amount of cycles are spent on system time or with CPUs idle. I have identified that the cause of this issue is a high number of memory allocations performed concurrently, causing the allocator to synchronize the threads and losing a lot of time waiting.

These memory allocations, though, are the result of a vector<double> operator overloading that is heavily used during the parallel loop in question (from now on region of interest). The function of the operator overloading is as follows:

std::vector<double> operator+( const std::vector<double>& v1 , const std::vector<double>& v2 )
{
 std::vector<double> v = v1;
 for( unsigned int i=0; i < v1.size() ; i++ )
 { v[i] += v2[i]; }
 return v; 
}

This is just an example, as the same is done with the other arithmetic operators and also with operations with a vector and a scalar (double type). As you can see, a new vector is initialized and then returned as a result of the operation. This causes at least one malloc and one free. But as I said, this is called several times during one iteration of the region of interest, and this loop runs for a high number of iterations on a high number of parallel threads (48, at most). One example of this operation call is the following:

std::vector<double> corner_point= 0.5*(my_voxel_center+other_voxel_center);

In this case, two operations are done one after the other (+ of vectors and then * of vector and scalar) and then the result is assigned to a newly created vector.

Question

So, my question is the following: as we have seen how bad this performs, which should be the best practice on operators overloading, specifically on types like vector<>, to avoid allocating and freeing a new vector every time this is called? Is there a better way to write that?

I have read the "https://stackoverflow.com/questions/4421706/what-are-the-basic-rules-and-idioms-for-operator-overloading" post searching for help, but there are no comments on the specific use of memory inside the overloading functions, and how these perform on concurrent applications.

I am aware that maybe there is no other way to that, in this case: where should I focus my attention to solve this issue? My thoughts are on:

Using another allocator that handles concurrency better.
Not using operator overloading at all to avoid allocating these temporal vectors, and perform the operation with a loop every time this appears at the code. In this case, code size growth should not be a problem as I said, this application is computation critical and this is the thing that matters the most.

Yksisarvinen · Answer 1 · 2020-11-22T23:43:34.370

Below suggestions have little to do with operator overloading, rather with the use of vectors in general.

There are few things to improve:

First, you should definitely use reserve(), before you copy anything.

std::vector<double> operator+( const std::vector<double>& v1 , const std::vector<double>& v2 )
{
 std::vector<double> v;
 v.reserve(v1.size() + v2.size());
 v = v1;
 for( unsigned int i=0; i < v1.size() ; i++ )
 { v[i] += v2[i]; }
 return v; 
}

This way you will have 1 allocation per operator call instead of several (at least 1, at most a lot if v1 is tiny and v2 is huge).

Then I'd guess range version of insert() might perform better than a loop, but that might not be true and should be tested (at least it shouldn't be worse than a loop).

std::vector<double> operator+( const std::vector<double>& v1 , const std::vector<double>& v2 )
{
 std::vector<double> v;
 v.reserve(v1.size() + v2.size());
 v.insert(v.end(), v1.begin(), v1.end());
 v.insert(v.end(), v2.begin(), v2.end());
 return v; 
}

Perhaps it is also worth considering if v1 must stay unmodified. If not, then it may be worth to just overload operator += and use it.

And of course make sure to experiment with different optimization levels, sometimes -Os or -O2 might be better than -O3

Thanks for your answer! I will take note of these improvements on vectors copying and allocation. But I think you misunderstood the meaning of the addition overloading. I want to add the two vectors in a mathematical way: adding each element of the vector with its corresponding one in the other vector, given that the two operands are the same size. This is the reason of the `for` loop. — Marc, Nov 23 '20 at 11:33
@Marc Ah, my bad. I got it wrong. My answer is invalid then, I'll remove it in a moment, but you can still consider if you need to keep original vector unchanged - if you could add to it instead of creating a new vector, it would need no allocation. Also, if you could specify size at compile time, `std::array` is much cheaper than vector (no allocations). — Yksisarvinen, Nov 23 '20 at 11:56

Best practice on arithmetic operator overloading of vector on concurrent programs

Problem context

Question

1 Answers1