I'd like to write a multi-threaded version of the following code
template<typename Out, typename In, typename Cond, typename Func>
std::vector<Out> collect(std::vector<In> const&values,
Cond const&cond, Func const&func)
{
std::vector<Out> result;
for(auto const&val : values)
if(cond(val))
result.emplace_back(func(val));
return result;
}
where the order of selected elements does not matter.
One simple approach is
template<typename Out, typename In, typename Cond, typename Func>
std::vector<Out> collect(std::vector<In> const&values,
Cond const&cond, Func const&func)
{
std::vector<Out> result(values.size());
std::atomic<size_t> index = 0;
// some multithreaded for loop implementation
parallel_for(size_t(0),values.size(),[&](size_t i) {
if(cond(values[i]))
result[index++] = func(values[i]);
});
result.resize(index);
return result;
}
(of course, initializing result
is serial, but let's ignore that here). This appears to work, but may not be lock-free. Is there a better way? In particular, can I avoid allocating too many data (if only a few input data get selected)?
The problem is very similar to std::copy_if
(it would be std::transform_if
it that existed) -- how is the parallel version std::copy_if(std::par,...)
implemented (which is C++17, but I'm restricted to C++11)?