1

For instance

int *p;
cudaMalloc(&p, sizeof(int));

will take around 20secs and my process will typically gain 650MB+ (though always a slightly different amount) in memory usage in task manager. GPU-Z also indicates an increase in dedicated memory usage of 200MB+ on my GPU.

  • Only happens with the first call to cudaMalloc
  • Does not matter if I call other CUDA functions before it, like cudaGetDevice
  • Does not happen in some other CUDA projects

I am using

  • Thrust, CUBLAS, cuRAND libraries
  • MSVC 2010 with NVCC
  • Nsight 3.0
  • CUDA 5.0

Why does this happen? What can be done?

Update: As mentioned in the comments below, this appears to stem from initialization (calling cudaFree(0) has the same effect). However, as to why it's so slow, perhaps it has something to do with the runtime errors - the following error occurs a good 30 times as the initialization line is hit:

First-chance exception at 0x74f0b727 in ...: Microsoft C++ exception: cudaError_enum at memory location 0x003ff9c4..
First-chance exception at 0x74f0b727 in ...: Microsoft C++ exception: cudaError_enum at memory location 0x003ff9c4..
First-chance exception at 0x74f0b727 in ...: Microsoft C++ exception: cudaError_enum at memory location 0x003ff9c4..
etc...

This still happens when I'm not allocating anything, like a solitary call to cudaFree(0); - no idea why...

mchen
  • 9,808
  • 17
  • 72
  • 125
  • init the pointer maybe? – huseyin tugrul buyukisik May 14 '13 at 21:59
  • Like `int n; int *p = &n`? – mchen May 14 '13 at 22:28
  • You are observing the overhead of lazy initialition of the CUDA context for the current device, triggered by most CUDA API calls. However, this time shouldn't normally be anywhere near 20 seconds. You can trigger the initialization of the CUDA context by calling another CUDA API functions, such as cudaFree(0), prior to the first call to cudaMalloc(). – njuffa May 14 '13 at 23:09
  • @njuffa - OK, but this doesn't explain why for some small projects I can still call `cudaMalloc` without any noticeable overhead at all – mchen May 14 '13 at 23:36
  • And these lags have become very noticeable right after I installed MSVC 2010 premium and Nsight 3.0 - coincidence or culprit? – mchen May 14 '13 at 23:38
  • Does it only happen in DEBUG mode? – axon May 15 '13 at 00:00
  • @axon - nope - release too :( – mchen May 15 '13 at 00:14
  • Did you try `cudaFree(0)`? If so, what happened to the speed of the `cudaMalloc` call? CUDA context initialization typically should take less than a second, so if the first `cudaMalloc` call takes 20 seconds, something else is going on. Are you running the app under the control of the debugger? – njuffa May 15 '13 at 00:53
  • Are you doing some [**proper error checking**](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api)? Is `cuda-memcheck` returning any useful error? The errors that you detected may very well be the key, but these are not very clear to say the least. – BenC May 15 '13 at 01:45

0 Answers0