I am learning parallel programming in C++. I have heard that it can be good for performance if accessing a shared vector each element is aligned in memory by 64 bytes because it reduces false sharing.
How can I create a std::vector<int>
where each element in the vector is aligned by 64 bytes? Using alignas(64) vector<int>
does not lead to desired effect as the vector itself is aligned by 64 bytes not each element inside it. The only workaround I found so far is to create a new struct with the desired alignment.
struct alignas(64) int64
{
value;
int64(int a): value(a) {}
}
The problem with this solution is that I cannot use the struct as it would be normal int. Thus I have to adjust my whole code to return value
.
Edit: Imagine you want to sum up a vector concurrently. One possibility would be to create a shared vector of counters where each thread sums into its own space (e.g. by thread_id). When each element of the vector would take one full catch line false sharing (I believe) should be reduced. Afterward one can sum up the resulting vector sequentially. Here is an example:
int false_share_sum(vector<int> &bigboy, int threads) {
vector<int> shared_counter(threads, 0); // here each int should be 64 byte
#pragma omp parallel for num_threads(threads)
for (auto iter = bigboy.begin(); iter < bigboy.end(); ++iter) {
shared_counter[omp_get_thread_num()] += *iter;
}
int sum = 0;
for (auto iter = shared_counter.begin(); iter != shared_counter.end(); ++iter)
{
sum += *iter;
}
return sum;
}