First of all you must understand that Lambda is an object, that implements operator()(args...)
member function. In your specific case it is operator()(int i)
.
In order to execute this lambda two parameters must be passed to the operator()(int)
:
- Lambda pointer (
this
)
- integer
Address of the lambda is an address to object (i.e. data) rather than address of code.
Thread start function instead is a function that accepts void*
and returns void*
. Address of function is an address to machine code.
Therefore, to execute your lambda you should define void* (void*)
function and pass its address as start_routine
parameter. The address of lambda you pass as arg
parameter:
template<typename Lambda>
void paral(int start, int end, Lambda&& lambda, int nT){
struct Args
{
int Start;
int End;
Lambda& Func;
};
// create captureless lambda
auto threadStart = +[](void* voidArgs) -> void*
{
auto& args = *static_cast<Args*>(voidArgs);
for(int i = args.Start; i < args.End; ++i)
args.Func(i);
return nullptr;
};
// I create one thread here. You will create more.
auto args = Args{start, end, lambda};
pthread_t handle;
int rc = pthread_create(&handle, NULL, threadStart, &args);
if(rc)
throw std::system_error(
std::error_code(rc, std::generic_category()),
"pthread_create");
pthread_join(handle, nullptr);
}
However in this specific case, you better to use std::thread
instead of pthread
library. In such case you code may look like following:
#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
template<typename Func>
void paral(int start,
int end,
Func &&func,
int threads_count = std::thread::hardware_concurrency())
{
std::atomic_int counter {start};
std::vector<std::thread> workers;
workers.reserve(threads_count);
for(int i = 0; i < threads_count; ++i) {
workers.emplace_back([end, &counter, &func] {
for(int val = counter++; val < end; val = counter++)
func(val);
});
}
for(int i = 0; i < threads_count; ++i)
workers[i].join();
}
int main() {
int C1[10];
int A[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int B[10] = {11, 12, 13, 14, 15, 16, 17, 18, 19, 110};
paral(0, 10, [&](int i){ C1[i] = A[i] + B[i]; });
for(auto&& v: C1)
std::cout << v << "\n";
std::cout << "Done. Bye!" << std::endl;
}
There is important note though. Your code may may work not as fast as you may expect. It will experience the false sharing problem as several threads modify the memory of the same cache line, which will force CPU cores to update their L1 caches every time when memory is updated by another CPU core.
See also: