5

I'm trying to setup tensorflow to use GPU acceleration with WSL 2 running Ubuntu 20.04. I'm following this tutorial and am running into the error seen here. However, when I follow the solution there and try to start docker with sudo service docker start I get told docker is an unrecognized service. However, considering I can access the help menu and whatnot, I know docker is installed. While I can get docker to work with the desktop tool, since it doesn't support Cuda as mentioned in the SO post from earlier, it's not very helpful. It's not really giving me error logs or anything, so please ask if you need more details.

Edit: Considering the lack of details, here are a list of solutions I've tried to no avail. 1 2 3

Update: I used sudo dockerd to get the container started and tried running the nvidia benchmark container only to be met with

INFO[2020-07-18T21:04:05.875283800-04:00] shim containerd-shim started                  address=/containerd-shim/021834ef5e5600bdf62a6a9e26dff7ffc1c76dd4ec9dadb9c1fcafb6c88b6e1b.sock debug=false pid=1960
INFO[2020-07-18T21:04:05.899420200-04:00] shim reaped                                   id=70316df254d6b2633c743acb51a26ac2d0520f6f8e2f69b69c4e0624eaac1736
ERRO[2020-07-18T21:04:05.909710600-04:00] stream copy error: reading from a closed fifo
ERRO[2020-07-18T21:04:05.909753500-04:00] stream copy error: reading from a closed fifo
ERRO[2020-07-18T21:04:06.001006700-04:00] 70316df254d6b2633c743acb51a26ac2d0520f6f8e2f69b69c4e0624eaac1736 cleanup: failed to delete container from containerd: no such container
ERRO[2020-07-18T21:04:06.001045100-04:00] Handler for POST /v1.40/containers/70316df254d6b2633c743acb51a26ac2d0520f6f8e2f69b69c4e0624eaac1736/start returned error: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled

Update 2: After installing windows insider and making everything as up to date as possible, I encountered a different error.

Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Error: only 0 Devices available, 1 requested.  Exiting.

I have a GTX 970, so I'm not sure why it's not being detected. After running sudo lshw -C display, it was confirmed that my graphics card isn't being detected. I got:

 *-display UNCLAIMED
       description: 3D controller
       product: Microsoft Corporation
       vendor: Microsoft Corporation
       physical id: 4
       bus info: pci@941e:00:00.0
       version: 00
       width: 32 bits
       clock: 33MHz
       capabilities: bus_master cap_list
       configuration: latency=0
BlackCoffee
  • 159
  • 1
  • 10
  • Try `sudo apt-get update` && `sudo apt-get install -y nvidia-container-toolkit`. – Amit kumar Jul 19 '20 at 16:55
  • I already have the nvidia-container-toolkit installed to the most recent version. Is there an insider version I need? – BlackCoffee Jul 19 '20 at 16:56
  • If you have the latest nvidia-container-toolkit already in place. Please try to symlink /sbin/ldconfig to /sbin/ldconfig.real – Amit kumar Jul 19 '20 at 17:01
  • I'm getting told /sbin/ldconfig.real exists – BlackCoffee Jul 19 '20 at 17:03
  • 1
    Looks to be an issue with the drivers. Please install latest drivers via PPA : `sudo add-apt-repository ppa:graphics-drivers/ppa` and `sudo apt update`. – Amit kumar Jul 19 '20 at 17:12
  • That in combination with the tutorial found [here](https://docs.nvidia.com/cuda/wsl-user-guide/index.html) worked! Thank you so much! – BlackCoffee Jul 19 '20 at 17:20
  • Could you try sudo apt list --installed | grep cuda to check your cuda version? All cuda package version should be 11-0 – Yi Zhang Jul 23 '20 at 01:00

0 Answers0