Use NVIDIA K20 cards on virtual machines on the same server with different CUDA SDK versions

Question

I work on bi-processor Debian Wheezy server with 4 Nvidia K20m cards. I actually use CUDA 5 with 304.54 driver and GCC 4.6.3 but I would like to update to Debian Jessie (GCC 4.9) and CUDA 7.5. I already evaluate CUDA 7.5 which give me different results than CUDA 5 because of used instructions by NVCC (e.g.: FMA instructions are not used at same places, see post).

The main goal is to get two different CUDA versions on this server to keep compatibility with older computations and to prepare future with CUDA new features.

I think there are two possibilities :

A VMWare ESXI or Citrix XenServer hypervisor which allow to create two virtual machines (Wheezy/SDK 5 and Jessie/SDK 7.5) connected to K20 cards in pass through mode. I can not view these video cards in their compatible hardware list but one NVidia driver release notes say they are pass through (320.78 release notes at page 11). Which driver I have to install at the hypervisor level ?
Install latest nvidia driver and use two NVidia docker containers with different Cuda SDK and Debian versions. Is it possible to run SDK 5 with latest driver ?

What do you think about these possibilities ? Do you have any idea ?

Thank you a lot.

score 2 · Accepted Answer · edited May 23 '17 at 11:45

I can't comment on the virtualisation suggestion, however, there is no problem in running the most recent release driver (so CUDA 7.5 at the time of writing) and using older toolkits with it.

Each CUDA toolkit release and its components are fully versioned, so you cannot mix the CUDA runtime and other libraries (cuFFT, CUBLAS, etc) from different toolkit releases or your own code built with those. However, drivers and the driver API they expose are backwards compatible. So you can use the CUDA 7.5 driver and driver API with either the CUDA 5 or CUDA 7.5 runtime without difficulty. You cannot, however, run a newer runtime on an old driver. That will generate a runtime error. I have found the modules utility very useful for selecting between toolkit/runtime versions for development and testing. My current development box has every release between 4.2 and 7.5 installed, with the 7.5 driver.

Note also that older toolchains require older host compilers and support libraries. So if you move to a more modern distribution, you will still need to devise a way to have a supported gcc installation for the older toolkit you want to use (see the release notes of your toolkits and this question for more details). Many distributions have built-in systems to manage multiple compiler versions, but it has been many years since I ran debian, so I am not sure about the specifics of debian alternative compiler version selection.

Thank you for clarifying driver/toolkit version usage. On the server I installed latest driver, and I create a Docker image with SDK 7.5 on Jessie and it works ! Now I am doing the same thing with SDK 5 on Wheezy for backward compatibility. I try to use Docker because I would like to get isolated environment without playing with GCC version. Finally if it does not work, I will use the modules utility. — Calex, Apr 01 '16 at 09:10

score 0 · Answer 2 · answered Apr 24 '16 at 17:11

Trying to complement talonmies answer on the virtualization side. It is possible to have two operating system instances with two different driver versions and/or CUDA versions on the same card. However, to my knowledge, it is only possible with PCI Pass-through, hence one instance at a time. With this configuration, the VM guest is in full control of the PCI device, and the Hypervisor does not need specific drivers.

Using the same device from two different instances at the same time requires some driver component at the Hypervisor level (see NVIDIA GRID), for which I don't know the current level of support of CUDA, if any.

Use NVIDIA K20 cards on virtual machines on the same server with different CUDA SDK versions

2 Answers2