I am a newbie to CUDA. I simply tried to sort an array using Thrust.
clock_t start_time = clock();
thrust::host_vector<int> h_vec(10);
thrust::generate(h_vec.begin(), h_vec.end(), rand);
thrust::device_vector<int> d_vec = h_vec;
thrust::sort(d_vec.begin(), d_vec.end());
//thrust::sort(h_vec.begin(), h_vec.end());
clock_t stop_time = clock();
printf("%f\n", (double)(stop_time - start_time) / CLOCKS_PER_SEC);
Time took to sort d_vec
is 7.4s, and time took to sort h_vec
is 0.4s
I am assuming its parallel computation on device memory, so shouldn't it be faster ?