2

I have an N-Series Azure VM (the Data Science VM) with Tesla K80 GPU. According to the NVIDIA scanner my GPU driver is up to date. When I run my CNTK Brainscript it says "No GPUs Found" and runs in CPU mode. What can I do to troubleshoot?

requestnodes [MPIWrapper]: using 1 out of 1 MPI nodes on a single host (1 reques
ted); we (0) are in (participating)
-------------------------------------------------------------------
Build info:

            Built time: Dec 22 2016 01:43:24
            Last modified date: Thu Dec 22 01:35:04 2016
            Build type: Release
            Build target: GPU
            With 1bit-SGD: yes
            With ASGD: yes
            Math lib: mkl
            CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8
.0
            CUB_PATH: c:\src\cub-1.4.1
            CUDNN_PATH: C:\local\cudnn-8.0-windows10-x64-v5.1
            Build Branch: HEAD
            Build SHA1: 8e8b5ff92eff4647be5d41a5a515956907567126
            Built by svcphil on DPHAIM-24
            Build Path: C:\jenkins\workspace\CNTK-Build-Windows\Source\CNTK\

-------------------------------------------------------------------
No GPUs found

Edit: here is the output from NVidia_smi.exe:

C:\Program Files\NVIDIA Corporation\NVSMI>.\nvidia-smi.exe
Fri Jan 13 19:00:43 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 369.30                 Driver Version: 369.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           TCC  | 0BD1:00:00.0     Off |                  Off |
| N/A   43C    P8    27W / 149W |      0MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           TCC  | 5871:00:00.0     Off |                  Off |
| N/A   35C    P8    34W / 149W |      0MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Mike Wise
  • 22,131
  • 8
  • 81
  • 104
Robert Sim
  • 1,428
  • 11
  • 22
  • One more note: the VM is Windows Server 2012 R2. It is a Standard NC12 Azure instance. – Robert Sim Jan 13 '17 at 19:17
  • I have tried to install the driver from NVIDIA but the driver installation fails. 377.35-tesla-desktop-winserver2008-2012r2-64bit-international-whql – Pablo Jomer Jul 14 '17 at 11:35

3 Answers3

3

The Windows Data Science VM bydefault does not come with the GPU drivers, CUDA etc. We do have an extension called "Deep Learning toolkit for DSVM" that adds on drivers, CUDA and GPU edition of deep learning software like CNTK, Tensorflow, MxNet.

More Info: http://aka.ms/dsvm/deeplearning

We also recently released a Ubuntu version of DSVM with builtin CUDA, GPU drivers and several more deep learning tools and can be deployed either on GPU VM or CPU only VMs on Azure.

Gopi - MSFT
  • 154
  • 6
  • Update: Windows 2016 Data Science VM (http://aka.ms/dsvm/win2016) comes with GPU drivers, CUDA and several frameworks like Tensorflow, Mxnet, Microsoft Cognitive Toolkit, Chainer. We keep adding new tools all the time. Please check the product page for latest. – Gopi - MSFT Nov 08 '17 at 21:10
1

Would it be possible for you to run the python notebooks and see if you could run them with the device being set to gpu(id)? or from activated CNTK python environment you could try setting some device.

import cntk as C
from cntk.device import set_default_device, gpu
C.device.set_default_device(C.device.gpu(0))

This might give you some clues whether it is Brainscript specific issue.

Sayan Pathak
  • 870
  • 4
  • 7
  • Thanks for following up. Here's the output of that script: Traceback (most recent call last): File ".\testGPU.py", line 3, in C.device.set_default_device(C.device.gpu(0)) File "E:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages \cntk\device.py", line 76, in gpu return cntk_py.DeviceDescriptor.gpu_device(device_id) ValueError: Specified GPU device id (0) is invalid. – Robert Sim Jan 13 '17 at 18:55
  • Running NVIDIA diagnostic tools now and I'll circle back. – Robert Sim Jan 13 '17 at 18:56
  • This set me on the right path. It would be good if install.ps warned about missing CUDA dependencies when installing GPU-enabled CNTK. – Robert Sim Jan 14 '17 at 04:50
0

Well the python script and Brainscript work now, after installing CUDA (I installed it to run NVIDIA_SMI). I should not have assumed that the Azure Data Science image (that only works with an N Series VM) has the necessary NVIDIA libraries pre-installed. :-)

Robert Sim
  • 1,428
  • 11
  • 22
  • 1
    This is good to know. Thanks for reporting. Will pass it on to the relevant teams supporting the N Series VMs for future upgrades. – Sayan Pathak Jan 16 '17 at 02:27