I was wondering whether this is possible? Could someone please point out if I'm wrong anywhere? I'm a complete newbie to CUDA.
__global__ void run_multiple_cpp(int *n, int *result){
int i = blockDim.x*blockIdx.x + threadIdx.x;
if (i < n){
result[i] = system("//path to a.out" -parameters[i])
}
}
int main(void){
// Get input here,
// kernel call which splits the input as shown above
return 0;
}
My question is whether this is possible without requiring to write the CPP file CUDA friendly. I've tried using __device__ and __host__ flags but my application is too big to be modified to support CUDA.
The operation above is always based on different set of inputs - I've tried CPU multithreading but I need to run this application for a large set of inputs. Hence I asked.