I developed an algorithm using thrust
. My office computer has one CUDA enabled card with architecture:
--- General information about Device 0 Name: Quadro 2000 Compute Capability: 2.1 Clock Rate: 1251000 kHz Device Overlap: Enabled Kernel Execution Timeout: Disabled
On this machine, my algorithm runs with no errors. However, a clean build on a lab machine throws a nasty thrust::system::system_error
when attempting to generate a device_vector
. Both machines are running RedHat 6 and are configured identically, with the exception of multiple graphics cards. This lab machine contains three CUDA enabled cards with the following architectures:
--- General information about Device 0 Name: Tesla C2050 Compute Capability: 2.0 Clock Rate: 1147000 kHz Device Overlap: Enabled Kernel Execution Timeout: Disabled
--- General information about Device 1 Name: Quadro 2000 Compute Capability: 2.1 Clock Rate: 1251000 kHz Device Overlap: Enabled Kernel Execution Timeout: Disabled
--- General information about Device 2 Name: Quadro 2000 Compute Capability: 2.1 Clock Rate: 1251000 kHz Device Overlap: Enabled Kernel Execution Timeout: Enabled`
I know that thrust
needs to be compiled against the target architecture in order to work. Therefore, I set the CUDA device to 1
. However, the error persists.
As a debugging measure, I placed a cudaGetDevice()
call immediately before device_vector
allocation. The device is correctly stated to be 1
.
int device;
CUDA_CHECK_RETURN(cudaGetDevice(&device), __FILE__, __LINE__);
std::cout << "Operating on device " << device << std::endl; // <-- device 1
// copy the turns to the runtime
thrust::device_vector<MalfunctionTurn> d_turns = turns; // <-- error here
I'm at my wits end trying to debug this. Has anyone seen an error like this before? More notably, is there a limitation in cudaSetDevice()
of which I'm not aware? I'm concerned because two identical cards on different machines cannot run the same code.
Thanks in advance.
EDIT
Compile command line: nvcc -rdc=true -arch=sm_21 -O3 file
Here is a minimal example that reproduces the error:
#define DEVICE __device__
#define HOST __host__
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
template <typename T, std::size_t N>
class Container {
public:
DEVICE HOST
Container() {
}
private:
T data[N];
};
typedef Container<double, 7> double7;
template <std::size_t N = 10 >
class History {
public:
DEVICE HOST
History() {
}
DEVICE HOST
virtual ~History() {
}
private:
double7 history[N];
};
int main() {
try {
thrust::host_vector<History<> > histories(1);
thrust::device_vector<History<> > d_histories = histories;
} catch (const thrust::system_error &) {
std::cerr << "boo boo" << std::endl;
}
return 0;
}