Let's assume I have a code which lets the user pass the threads_per_block
to call the kernel. Then I want to check, if the input is valid (e.g. <=512 for compute capability CC <2.0 and 1024 for CC >=2.0).
Now I wonder what would happen if I compile the code with nvcc -arch=sm_13
while having a graphics card in my computer with CC2.0, when the user passes threads_per_block == 1024
? Is this:
- a valid input - since the card I run has CC2.0, or...
- invalid since I compiled it for CC1.3?
Or does the nvcc -arch=sm_13
just mean that CC1.3 is at least necessary but when running it on higher CC, those higher features can although be used?