2

I am learning parallel programming in C++. I have heard that it can be good for performance if accessing a shared vector each element is aligned in memory by 64 bytes because it reduces false sharing.

How can I create a std::vector<int> where each element in the vector is aligned by 64 bytes? Using alignas(64) vector<int> does not lead to desired effect as the vector itself is aligned by 64 bytes not each element inside it. The only workaround I found so far is to create a new struct with the desired alignment.

struct alignas(64) int64 
{
     value; 
     int64(int a): value(a) {}
}

The problem with this solution is that I cannot use the struct as it would be normal int. Thus I have to adjust my whole code to return value.

Edit: Imagine you want to sum up a vector concurrently. One possibility would be to create a shared vector of counters where each thread sums into its own space (e.g. by thread_id). When each element of the vector would take one full catch line false sharing (I believe) should be reduced. Afterward one can sum up the resulting vector sequentially. Here is an example:

int false_share_sum(vector<int> &bigboy, int threads) {
    vector<int> shared_counter(threads, 0); // here each int should be 64 byte

    #pragma omp parallel for num_threads(threads)
    for (auto iter = bigboy.begin(); iter < bigboy.end(); ++iter) {
        shared_counter[omp_get_thread_num()] += *iter;
    }

    int sum = 0;
    for (auto iter = shared_counter.begin(); iter != shared_counter.end(); ++iter)
    {
        sum += *iter;
    }

    return sum;
}
deepNdope
  • 179
  • 3
  • 14
  • What are you actually using this for? There is probably a better way to do it then using an entire cache line for a single `int`. – NathanOliver Feb 07 '19 at 19:41
  • 1
    This is weird. Normally, aligning the vector the 64 bytes is what's wanted, not aligning individual ints. –  Feb 07 '19 at 19:42
  • the single `int` will have better alignment, but you increase the total memory footprint which can have a negative impact on cache friendlyness, maybe just enough to cancel the gain from the aligned elements, maybe more – 463035818_is_not_an_ai Feb 07 '19 at 19:44
  • Regarding "I have heard that it can be good for performance if ...", the answer is: profile, profile, profile. If you make a change for performance reasons, and you do not measure the performance gain (or loss...!), then you're just pulling wires are random, and possibly doing harm rather than good. Especially if the harm is compounded by making the code harder to read, and harder to maintain. – Eljay Feb 07 '19 at 19:48
  • I think, you are misguided. Using the whole cache for a single int will pretty much make the cache useless. The better way to avoid false sharing is to split load between threads more wisely. – SergeyA Feb 07 '19 at 19:52
  • related: https://stackoverflow.com/questions/46919032/why-does-using-the-same-cache-line-from-multiple-threads-not-cause-serious-slowd – geza Feb 07 '19 at 20:09

1 Answers1

3

This is exactly the kinds of things std::hardware_destructive_interference_size and std::hardware_constructive_interference_size were made for. See more at: https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size

#include <new>
struct AlignedInt {
    alignas(std::hardware_destructive_interference_size) int value;
};

...

std::vector<AlignedInt> vec;

Now if you actually want to do this is a different question. Make sure to measure your performance, and use whatever works best for your actual problem. If you're not having destructive false sharing, you should probably leave it up to your compiler/CPU.

N00byEdge
  • 1,106
  • 7
  • 18