-1

CUDA/C++ noob here.

The error I receive on attempting to debug my CUDA project is:

First-chance exception at 0x000000013F889467 in simple6.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x0000000000223000).

The program '[2668] simple6.exe' has exited with code 0 (0x0).

From research on the web, it seems that I have some large variables that are too large for the "stack" and need to be moved to the "heap".

Can someone please provide me the appropriate code modifications?

My code is below. The point of this kernel is to use h_S and h_TM to create a bunch of values and write these values into h_F at the very end. This is why h_F is never copied into the GPU.

int main()
{

int blockSize= 1024; 
int gridSize = 1; 

const int reps = 1024; 
const int iterations = 18000; 


int h_F [reps * iterations] = {0};
int h_S [reps] = {0}; // not actually zeros in my code this just simplifies things
int h_TM [2592] = {0} // not actually zeros in my code this just simplifies things

// Device input vectors
float *d_F;
double *d_S;
float *d_TM;

//Select GPU
cudaSetDevice(0);


// Allocate memory for each vector on GPU

cudaMalloc((void**)&d_F, iterations * reps * sizeof(float));
cudaMalloc((void**)&d_S, reps * sizeof(double));
cudaMalloc((void**)&d_TM, 2592 * sizeof(float));

// Copy host vectors to device
cudaMemcpy( d_S, h_S, reps * sizeof(double), cudaMemcpyHostToDevice);
cudaMemcpy( d_TM, h_TM, 2592 * sizeof(float), cudaMemcpyHostToDevice);

// Execute the kernel
myKern<<<gridSize, blockSize>>>(d_TM, d_F, d_S, reps);
cudaDeviceSynchronize(); 


// Copy array back to host
cudaMemcpy( h_F, d_F, iterations * reps * sizeof(float), cudaMemcpyDeviceToHost );

// Release device memory
cudaFree(d_F);
cudaFree(d_TM);
cudaFree(d_S);

cudaDeviceReset();
return 0;

Also, related, but would making these huge input arrays "shared" variables solve my problem?

Many thanks.

trincot
  • 317,000
  • 35
  • 244
  • 286
Jordan
  • 305
  • 3
  • 13
  • 2
    Sorry, we don't "provide appropriate code modifications" here. Google `malloc` or read up on the C++ `new` operator if you want to learn about dynamic memory allocation in C++. – talonmies Jun 24 '14 at 05:18
  • I just need some guidance not a spoon feeding. If I understand correctly, dynamic memory is limited in this context because it only allows declaration of one variable. – Jordan Jun 24 '14 at 05:28
  • 2
    Possible duplicate of [Basic CUDA C Program Crashing Under Certain Conditions](http://stackoverflow.com/questions/20127835/basic-cuda-c-program-crashing-under-certain-conditions). – Vitality Jun 24 '14 at 05:29
  • 1
    Here is a general tip: your first C++ program shouldn't be your first CUDA program. You could remove every line of CUDA specific code from what you posted and the problem would remain. Your question is really "how do I dynamically allocate memory in C++", and the answer to that is readily available if you choose to look for it – talonmies Jun 24 '14 at 06:16
  • Fair enough. Could you please at least confirm that h_F shouldn't be able to cause a stack overflow because it is never sent to the GPU? Thank you talonmies. – Jordan Jun 24 '14 at 06:26
  • 1
    The error you are getting is happening in *host* code. You either are exceeding the stack frame size, or performing some other sort of illegal memory operation in your main. It has nothing to do with the GPU – talonmies Jun 24 '14 at 06:59

1 Answers1

0

So I read through your code and it seems like only one of those 3 arrays are actually going to be causing the stack overflow error. This is assuming your reps doesn't get too big. This array causing the problem is h_F. All you have to do is declare h_F so that it gets placed on the heap instead of the stack, as you said.

This is literally a one line change.

Simply declare h_F like this:

float *h_F = new float[(reps * iterations)];

Good luck!

Jordan
  • 305
  • 3
  • 13
  • It works!!! Thank you so much, I was losing my head over this! Documentation on this stuff is so hard to understand. – Jordan Jun 24 '14 at 19:01