I was trying the first example of the official website's example https://developer.nvidia.com/thrust and changed the vector size to 32<<23. The code is like:
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/copy.h>
#include <algorithm>
#include <cstdlib>
#include <time.h>
using namespace std;
int main(void){
// generate random numbers serially
thrust::host_vector<int> h_vec(32 << 23);
std::generate(h_vec.begin(), h_vec.end(), rand);
std::cout << "1." << time(NULL) << endl;
// transfer data to the device
thrust::device_vector<int> d_vec = h_vec;
cout << "2." << time(NULL) << endl;
// sort data on the device (846M keys per second on GeForce GTX 480)
thrust::sort(d_vec.begin(), d_vec.end());
// transfer data back to host
thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());
std::cout << "3." << time(NULL) << endl;
return 0;
}
But the program crashed when running to the line of thrust::sort. I tried to alternatively use std::vector and std:sort and it worked well.
Is this a bug of thrust?? I am using Thrust 1.7 + Cuda 6.5 + Visual Studio 2013 Update 2.
I was using GeForce GT 740M with a total memory of 2048M.
I used processexplorer to monitor the process and saw it allocated 1.0G memory. But I have 2G GPU memory, 16G main CPU memory.
The error message is "A problem caused the program to stop working correctly. Windows will close the program and notify you if a solution is available. [Debug] [Close Program]". After clicking [Debug], I could see the call stack. The issue is from this line:
thrust::device_vector<int> d_vec = h_vec;
The last source from cuda is this:
testcuda.exe!thrust::system::cuda::detail::malloc<thrust::system::cuda::detail::tag>(thrust::system::cuda::detail::execution_policy<thrust::system::cuda::detail::tag> & __formal, unsigned __int64 n) Line 48 C++
It is seems a memory allocation issue. But I have 2G GPU memory, 16G main CPU memory. Why??
To Robert:
The original example works well, even for 32<<21, 32<<22. Is there a virtual memory management system for GPU memory? Is CONTINUOUS here means physically continuous or virtually? Is there any exception raised in this scenario then I can catch it?
My test code is herer: https://github.com/henrywoo/wufuheng/blob/master/testcuda.cu
In my test, there is no exception, but a runtime error.