CUDA/C++ noob here.
The error I receive on attempting to debug my CUDA project is:
First-chance exception at 0x000000013F889467 in simple6.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x0000000000223000).
The program '[2668] simple6.exe' has exited with code 0 (0x0).
From research on the web, it seems that I have some large variables that are too large for the "stack"
and need to be moved to the "heap"
.
Can someone please provide me the appropriate code modifications?
My code is below. The point of this kernel is to use h_S
and h_TM
to create a bunch of values and write these values into h_F
at the very end. This is why h_F
is never copied into the GPU.
int main()
{
int blockSize= 1024;
int gridSize = 1;
const int reps = 1024;
const int iterations = 18000;
int h_F [reps * iterations] = {0};
int h_S [reps] = {0}; // not actually zeros in my code this just simplifies things
int h_TM [2592] = {0} // not actually zeros in my code this just simplifies things
// Device input vectors
float *d_F;
double *d_S;
float *d_TM;
//Select GPU
cudaSetDevice(0);
// Allocate memory for each vector on GPU
cudaMalloc((void**)&d_F, iterations * reps * sizeof(float));
cudaMalloc((void**)&d_S, reps * sizeof(double));
cudaMalloc((void**)&d_TM, 2592 * sizeof(float));
// Copy host vectors to device
cudaMemcpy( d_S, h_S, reps * sizeof(double), cudaMemcpyHostToDevice);
cudaMemcpy( d_TM, h_TM, 2592 * sizeof(float), cudaMemcpyHostToDevice);
// Execute the kernel
myKern<<<gridSize, blockSize>>>(d_TM, d_F, d_S, reps);
cudaDeviceSynchronize();
// Copy array back to host
cudaMemcpy( h_F, d_F, iterations * reps * sizeof(float), cudaMemcpyDeviceToHost );
// Release device memory
cudaFree(d_F);
cudaFree(d_TM);
cudaFree(d_S);
cudaDeviceReset();
return 0;
Also, related, but would making these huge input arrays "shared" variables solve my problem?
Many thanks.