How to run a compiled CUDA code on a machine that doesn't have the CUDA toolkit installed?

Question

will any memory bound application benefit from high memory throughput of tesla(cc2.0) more than high number of cuda cores of geforce (cc5.0)?

how can i run exe filed compiled on machine with geforce card on another machine with tesla card without installing VS2010 and cuda on tesla machine (ie i want this exe file to be stand alone application)?

score 4 · Accepted Answer · edited May 23 '17 at 10:29

will any memory bound application benefit from high memory throughput of tesla(cc2.0) more than high number of cuda cores of geforce (cc5.0)?

A memory bound CUDA application will likely run fastest on whichever GPU has higher memory bandwidth. There are certainly other factors that could affect this, but this is a reasonable general principle. I'm not sure which 2 cards you are referring to, but it's entirely possible that a particular GeForce GPU could have higher memory bandwidth than a particular Tesla GPU. The cc2.0 Tesla GPUs (e.g. M2050, C/M2070, C/M2075, M2090) probably do have higher memory bandwidth (over 100GB/s) than the cc5.0 GeForce GPUs I am aware of (e.g. GeForce GTX 750/750Ti -- less than 90GB/s).

how can i run exe filed compiled on machine with geforce card on another machine with tesla card without installing VS2010 and cuda on tesla machine (ie i want this exe file to be stand alone application)?

There are a few things that are pretty easy to do, which will make it easier to move a compiled CUDA code from one machine to another.

make sure the CUDART library is statically linked. This should be the default settings for recent CUDA versions. You can read more about it here. If you are using other libraries (e.g. CUBLAS, etc.) you will want to make sure those other libraries are statically linked also (if possible) or bundle the library (.so file in linux, .dll in windows) with your application.
compile for a range of compute architectures. If you know, for example that you only need to and want to target cc2.0 and cc5.0, then make sure your nvcc compile command line contains switches that target both cc2.0 and cc5.0. This is a fairly complicated topic, but if you review the CUDA sample codes (makefiles or VS projects) you will find examples of projects that build for a wide variety of architectures. For maximum compatibility, you probably want to make sure you are including both PTX and SASS in your executables. You can read more about it here and here.
Make sure the machines have compatible drivers. For example, if you compile a CUDA code using CUDA 7.0 toolkit, you will only be able to run it on a machine that has a compatible GPU driver installed (the driver is a separate item from the toolkit. A GPU driver is required to make use of a GPU, the CUDA toolkit is not.) For CUDA 7, this roughly means you want an r346 or newer driver installed on any machine that you want to run a CUDA 7 compiled code on. Other CUDA toolkit versions have other associated minimum driver versions. For reference, this answer gives an idea of the approximate minimum GPU driver versions needed for some recent CUDA toolkit versions.

thanks sir for this informative answer, i wish if there is something like backward compatibility between GPU drivers ? — pyCuda, Jul 05 '15 at 22:36
As far as I know, there is no way to run code compiled for a newer CUDA runtime version on an older driver, i.e. one that does not support that newer runtime version. I don't know if you're actually asking a question here, or what that question is, exactly. — Robert Crovella, Jul 05 '15 at 23:05

How to run a compiled CUDA code on a machine that doesn't have the CUDA toolkit installed?

1 Answers1