I've got a bunch of threads working on getting a result and storing it in a shared vector.
Right now I am doing this in the following way, but am wondering if there is a faster way of doing it (speed is the only priority here so please focus on that), e.g. to avoid false sharing and the like.
Globally, there exists a vector
std::vector<double> allResults;
and each thread stores its results in this vector by writing
std::for_each(allResults.begin() + i, allResults.begin() + i + n,
[](double& res)
{
res = .... //Calculate result and store it.
});
Here, for each thread, i, n
are indeces that correspond to a range of results that the current thread is supposed to calculate: allResults[i], ...., allResults[i + n - 1]
. These ranges are unique for every thread. So each thread is working on its own range of results, without overlaps.