Consider this code:
#include <algorithm>
#include <chrono>
#include <cstdio>
#include <execution>
#include <functional>
#include <random>
#include <vector>
using namespace std;
using namespace std::chrono;
constexpr size_t NUM_OF_ELEMENTS = 30000000;
// execute lambda and print the execution time
void measure(function<void()> lambda)
{
auto start = high_resolution_clock::now();
lambda();
auto end = high_resolution_clock::now();
printf("%ld\n", duration_cast<microseconds>(end - start).count());
}
int main()
{
random_device rd;
mt19937_64 gen(rd());
// range from INT_MIN to INT_MAX
uniform_int_distribution<> distr(-2147483648, 2147483647);
vector<int> original;
original.reserve(NUM_OF_ELEMENTS);
for(size_t i = 0; i < NUM_OF_ELEMENTS; i++)
original.push_back(distr(gen));
vector<int> the_copy(original.begin(), original.end());
// sort with single thread
measure([&]{ sort(original.begin(), original.end()); });
// sort with execution::par
measure([&]{ sort(execution::par, the_copy.begin(), the_copy.end()); });
return 0;
}
The code can be summed up in a few points:
- create random number generator
- create vector of random integers
- create a copy of that vector
- sort original vector with one thread and measure the time of execution
- sort the copy with
std::execution::par
and measure the time of execution - print the execution times
The execution::par
version always takes longer. Doesn't matter what value NUM_OF_ELEMENTS
has. I tried values from 100 000 to 30 000 000 incrementing by 100 000. The code above produces similar results like this (values in microseconds):
9729406 // single thread
10834613 // execution::par
I compiled the code on Windows 10 with gcc using VS Code:
g++ -std=c++17 -g ${workspaceFolder}/main.cpp -o ${workspaceFolder}/main.exe
For C++ standard library I use mingw distribution which can be found here.
Program versions: GCC 10.1.0 + LLVM/Clang/LLD/LLDB 10.0.0 + MinGW-w64 7.0.0
My processor has 6 cores and at the time of execution I didn't run any major programs or background processes.
First, I thought it has something to do with the size of the vector but 30 000 000 elements is surely enough. It already runs for 10 seconds before finishing a single test.
- What is going on?
- Is
execution::par
meant to be used like this? - Do I have to enable some compilation flags or some other trick to make it work as expected?