For instance
int *p;
cudaMalloc(&p, sizeof(int));
will take around 20secs and my process will typically gain 650MB+ (though always a slightly different amount) in memory usage in task manager. GPU-Z also indicates an increase in dedicated memory usage of 200MB+ on my GPU.
- Only happens with the first call to
cudaMalloc
- Does not matter if I call other CUDA functions before it, like
cudaGetDevice
- Does not happen in some other CUDA projects
I am using
- Thrust, CUBLAS, cuRAND libraries
- MSVC 2010 with NVCC
- Nsight 3.0
- CUDA 5.0
Why does this happen? What can be done?
Update:
As mentioned in the comments below, this appears to stem from initialization (calling cudaFree(0)
has the same effect). However, as to why it's so slow, perhaps it has something to do with the runtime errors - the following error occurs a good 30 times as the initialization line is hit:
First-chance exception at 0x74f0b727 in ...: Microsoft C++ exception: cudaError_enum at memory location 0x003ff9c4..
First-chance exception at 0x74f0b727 in ...: Microsoft C++ exception: cudaError_enum at memory location 0x003ff9c4..
First-chance exception at 0x74f0b727 in ...: Microsoft C++ exception: cudaError_enum at memory location 0x003ff9c4..
etc...
This still happens when I'm not allocating anything, like a solitary call to cudaFree(0);
- no idea why...