Accessing shared vector in multi-threaded application

Question

I've got a bunch of threads working on getting a result and storing it in a shared vector.

Right now I am doing this in the following way, but am wondering if there is a faster way of doing it (speed is the only priority here so please focus on that), e.g. to avoid false sharing and the like.

Globally, there exists a vector

std::vector<double> allResults;

and each thread stores its results in this vector by writing

std::for_each(allResults.begin() + i, allResults.begin() + i + n, 
    [](double& res) 
    { 
    res = .... //Calculate result and store it.
    });

Here, for each thread, i, n are indeces that correspond to a range of results that the current thread is supposed to calculate: allResults[i], ...., allResults[i + n - 1]. These ranges are unique for every thread. So each thread is working on its own range of results, without overlaps.

Yes, the main thread waits for all threads to complete first. — DejayeBo, Oct 22 '18 at 14:06
To avoid false sharing you need `sizeof(double) * n` to be evenly divisible by the cache line size. That can be tricky to do in portable code though. — NathanOliver, Oct 22 '18 at 14:10
I recently did something similar in [SO: Eigen library with C++11 multithreading](https://stackoverflow.com/a/52724681/7478597) (My answer didn't rely on Eigen.) and [SO: Multi-threading benchmarking issues](https://stackoverflow.com/a/52835213/7478597). When I wrote the first, I was in doubt for quite the same reasons and googled a bit. So, the first answer contains some notes and links about this. — Scheff's Cat, Oct 22 '18 at 14:11
Why not let each thread work with individual vectors and then merge them at the end? I would start with that implementation and improve only after profiling. — balki, Oct 22 '18 at 14:16
You need to `reserve()` the space first and make sure to never resize the `vector` while working on it's elements in different threads. Also make sure that writes from different threads are separated by >64 bytes or else performance will suffer from false sharing. — rustyx, Oct 22 '18 at 14:16
In addition to @NathanOliver: Concerning the issue of false cache sharing, I recently got a link from a colaborator (which I've not yet fully read and understood): [Eliminate False Sharing](http://www.drdobbs.com/parallel/eliminate-false-sharing/217500206). — Scheff's Cat, Oct 22 '18 at 14:18
Try this: Split the vector by the amount of threads. Each split is a separate vector. Have the threads work only on their vector. When all done, merge the vectors in to 1. — Ivan Rubinson, Oct 22 '18 at 15:12
The advice of splitting the vectors and having each thread work with its own vector definitely seems to work, I got the code running faster... but the trouble is I actually don't know how much work a vector will do beforehand. So now I have the choice between allocating a very large vector for each thread (I know the MAX amount of work so I could create vectors of that size) or giving it a smaller size and then resizing as needed. Probably going to have to run some tests to see what works best. — DejayeBo, Oct 22 '18 at 18:44

Accessing shared vector in multi-threaded application

0 Answers0