So I was looking up how to do some parallelism just using stl c++ stuff and found the following bit of code on another question here in Stack Overflow
template <typename RAIter> //FOUND ON STACK OVERFLOW
int parallel_sum(RAIter beg, RAIter end)
{
auto len = end - beg;
if (len < 1000)
return std::accumulate(beg, end, 0);
RAIter mid = beg + len / 2;
auto handle = std::async(std::launch::async,
parallel_sum<RAIter>, mid, end);
int sum = parallel_sum(beg, mid);
return sum + handle.get();
}
I wanted to make a general parallel_for_each function that loops over a (hopefully) arbitrary container type and applies an algorithm to each entry so I modified the above to the following:
template <typename ContainerIterator, typename containerSizeType, typename AlgorithmPerEntry> //modified version of parallel sum code above : https://stackoverflow.com/questions/36246300/parallel-loops-in-c
void parallel_for_each(ContainerIterator beg, ContainerIterator end, AlgorithmPerEntry& algorithm, containerSizeType maxProbSize)
{
containerSizeType len = end - beg;
if (len < maxProbSize){//if you are sufficiently small, go ahead and execute
std::for_each(beg, end, algorithm);
std::cout << "working on processor with id = " << GetCurrentProcessorNumber() << std::endl;//the processor id's change so I'm assuming this is executing in parallel
return;
}
//otherwise, continue spawning more threads
ContainerIterator mid = beg + len / 2;
auto handle = std::async(std::launch::async,
parallel_for_each<ContainerIterator, containerSizeType, AlgorithmPerEntry>, mid, end, algorithm, maxProbSize);
parallel_for_each(beg, mid, algorithm, maxProbSize);
handle.get(); //corrected as advised
}
I wanted to test is with a super simple functor so I made the following:
template<typename T>
struct dataSetter
{
const T& set_to;
dataSetter(const T& set_to_in) : set_to(set_to_in){}
void operator()(T& set_this)
{
set_this = set_to;
}
};
Pretty straight forward, just sets the value of some arg into its operator()
Here's my main function's body
std::vector<int> ints(100000);
unsigned minProbSize = 1000;
int setval = 7;
dataSetter<int> setter(setval);
parallel_for_each(ints.begin(), ints.end(), setter, minProbSize);//parallel assign everything to 7
//some sort of wait function to go here?
std::cout << std::endl << "PS sum of all ints = " << parallel_sum(ints.begin(), ints.end()) << std::endl; //parallel sum the entries
int total = 0;//serial sum the entries
for (unsigned i = 0; i < ints.size(); i++)
total += ints[i];
std::cout << std::endl << "S sum of all ints = " << total << std::endl;
std::cout << std::endl << "PS sum of all ints = " << parallel_sum(ints.begin(), ints.end()) << std::endl; //parallel sum the entries again
Here are some outputs :
PS sum of all ints = 689052
S sum of all ints = 700000
PS sum of all ints = 700000
output from another run:
PS sum of all ints = 514024
S sum of all ints = 700000
PS sum of all ints = 700000
It consistently gets the first parallel sum over the vector low. My guess as to what is happening is that all the assignment threads get created, then the summing threads get created, but certain sum threads are executing prematurely (before the last assignment thread). Is there any way I can force a wait? And as always, I'm open to all advice.