4

I'm trying to run and deploy the Dalle Playground on my local machine using an AMD GPU, I'm on Windows 11 with a WSL instance running.

Link to Dalle Playground repo

System OS: Windows 11 Pro - Version 21H1 - OS Build 22000.675
WSL Version: WSL 2 
WSL Kernel: 5.10.16.3-microsoft-standard-WSL2
WSL OS: Ubuntu 20.04 LTS
GPU: AMD Radeon RX 6600 XT
CPU: AMD Ryzen 5 3600XT (32GB ram)

I have been able to deploy the backend and frontend successfully but it runs off the CPU.

It gives me this warning:

    --> Starting DALL-E Server. This might take up to two minutes.
2022-06-12 01:16:33.012306: I external/org_tensorflow/tensorflow/core/tpu/tpu_initializer_helper.cc:259] Libtpu path is: libtpu.so
2022-06-12 01:16:37.581440: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x5a4e760 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2022-06-12 01:16:37.581474: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182]   StreamExecutor device (0): Interpreter, <undefined>
2022-06-12 01:16:37.587860: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:176] TfrtCpuClient created.
2022-06-12 01:16:37.588478: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

I have been trying to make my GPU accessible by my WSL instance but I can't work out what I'm doing wrong. For use with GPU I've entered the following command from pytorch:

pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm4.5.2

I confirmed that pytorch is installed correctly by running the sample PyTorch code supplied on their website.

After doing some digging I realised that the rock-dkms package needed to be installed also, so I followed the advice on this website and installed it successfully - after a lot of issues.

When I try to check ROC for my GPU this is what comes up:

$ /opt/rocm/bin/rocminfo
ROCk module is NOT loaded, possibly no GPU devices

$ /opt/rocm/opencl/bin/clinfo
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (3423.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               0

Based on this response, it seems to be there's definitely some sort of AMD compatible driver available and if you look at the attached photo you can see what shows up when I try to query for the GPU. I don't know at this stage if WSL can see and/or access my GPU or not, as glxinfo can identify it but nothing else can. (Even if it gets my VRAM wrong)

ANY advice would be very helpful, I know this issue may not be project specific but I tried to include as much information about what I am doing as possible for the best chance of figuring this out.

Installed AMD GPU libs:

$ sudo apt list|grep -i gpu|grep installed

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libdrm-amdgpu1/focal-updates,focal-security,now 2.4.107-8ubuntu1~20.04.2 amd64 [installed]
libosdgpu3.4.0/focal,now 3.4.0-6build1 amd64 [installed,automatic]

Installed ROC packages:

        $ apt list --installed | grep -i roc
    
    WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
    
    hsa-rocr-dev/Ubuntu,now 1.5.0.50100-36 amd64 [installed,automatic]
    hsa-rocr/Ubuntu,now 1.5.0.50100-36 amd64 [installed,automatic]
    hsakmt-roct-dev/Ubuntu,now 20220128.1.7.50100-36 amd64 [installed,automatic]
    hsakmt-roct/Ubuntu,now 20210520.3.071986.40301-59 amd64 [installed,automatic]
    libopencv-imgproc4.2/focal,now 4.2.0+dfsg-5 amd64 [installed,automatic]
    libpostproc55/focal-updates,focal-security,now 7:4.2.7-0ubuntu0.1 amd64 [installed,automatic]
    libprocps8/focal-updates,now 2:3.3.16-1ubuntu2.3 amd64 [installed,automatic]
    procps/focal-updates,now 2:3.3.16-1ubuntu2.3 amd64 [installed,automatic]
    python3-ptyprocess/focal,now 0.6.0-1ubuntu1 all [installed,automatic]
    rock-dkms-firmware/Ubuntu,now 1:4.3-59 all [installed,automatic]
    rock-dkms/Ubuntu,now 1:4.3-59 all [installed,automatic]
    rocm-clang-ocl/Ubuntu,now 0.5.0.50100-36 amd64 [installed,automatic]
    rocm-cmake/Ubuntu,now 0.7.2.50100-36 amd64 [installed,automatic]
    rocm-core/Ubuntu,now 5.1.0.50100-36 amd64 [installed,automatic]
    rocm-dbgapi/Ubuntu,now 0.64.0.50100-36 amd64 [installed,automatic]
    rocm-debug-agent/Ubuntu,now 2.0.3.50100-36 amd64 [installed,automatic]
    rocm-dev/Ubuntu,now 5.1.0.50100-36 amd64 [installed,automatic]
    rocm-device-libs/Ubuntu,now 1.0.0.50100-36 amd64 [installed,automatic]
    rocm-dkms/Ubuntu,now 5.1.0.50100-36 amd64 [installed]
    rocm-gdb/Ubuntu,now 11.2.50100-36 amd64 [installed,automatic]
    rocm-llvm/Ubuntu,now 14.0.0.22114.50100-36 amd64 [installed,automatic]
    rocm-ocl-icd/Ubuntu,now 2.0.0.50100-36 amd64 [installed,automatic]
    rocm-opencl-dev/Ubuntu,now 2.0.0.50100-36 amd64 [installed,automatic]
    rocm-opencl/Ubuntu,now 2.0.0.50100-36 amd64 [installed,automatic]
    rocm-smi-lib/Ubuntu,now 5.0.0.50100-36 amd64 [installed,automatic]
    rocm-utils/Ubuntu,now 5.1.0.50100-36 amd64 [installed,automatic]
    rocminfo/Ubuntu,now 1.0.0.50100-36 amd64 [installed,automatic]
    rocprofiler-dev/Ubuntu,now 1.0.0.50100-36 amd64 [installed,automatic

]
roctracer-dev/Ubuntu,now 1.0.0.50100-36 amd64 [installed,automatic]

Trying to find GPU

EDIT: Just ran the following and got a little more information about the current state of my system.

$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (AMD Radeon RX 6600 XT) (0xffffffff)
    Version: 22.2.0
    Accelerated: yes
    Video memory: 24485MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.2
    Max compat profile version: 4.2
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (AMD Radeon RX 6600 XT)
OpenGL core profile version string: 4.2 (Core Profile) Mesa 22.2.0-devel (git-cbcdcc4 2022-06-11 focal-oibaf-ppa)
OpenGL core profile shading language version string: 4.20
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.2 (Compatibility Profile) Mesa 22.2.0-devel (git-cbcdcc4 2022-06-11 focal-oibaf-ppa)
OpenGL shading language version string: 4.20
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.2.0-devel (git-cbcdcc4 2022-06-11 focal-oibaf-ppa)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

Adding to the confusion even more, glmark2 seems to be able to use my GPU fine, possibly an issue with the dalle program and not WSL?

GLmark2 running successfully despite issues with dalle playground

Joel Gray
  • 139
  • 1
  • 1
  • 8
  • Have you tried https://learn.microsoft.com/en-us/windows/ai/directml/gpu-pytorch-wsl? Microsoft recommends setting up GPU accelerated PyTorch with this method here from inside WSL. Hope this helps. – Ethan Jun 12 '22 at 01:00
  • @Ethan yeah I have done that and also completed the test to ensure it's working at the end of that page, but doesn't seem to help me get this working – Joel Gray Jun 12 '22 at 01:12
  • Please take a look at my answer and let me know what you think. If you agree then please mark it as correct, else drop a comment with what you believe to be a better solution. – Ethan Jun 17 '22 at 17:52
  • What a stupid rabbit hole. I'm having the same issue, and having flashbacks to the last time I tried using a GPU for scientific computing after buying a fancy gaming MSI laptop 10 years ago. – geneorama Jun 19 '22 at 05:10

2 Answers2

5

Issue eventually came down to the fact that AMD GPUs don't work with CUDA, and the DALL-E Playground project only supports CUDA. Basically to run DALL-E Playground you must be using an Nvidia GPU. Alternatively you can run the project from your CPU.

I hope this covers any questions anyone may have.

Joel Gray
  • 139
  • 1
  • 1
  • 8
  • Question, can i do some similar for run stable diffusion using wsl? I am in windows 11 too (But stable diffusion DirectML {Because ai have AMD gpu} is shit for me.) I would like use it in wsl, can get the same results using wsl than in real linux with ROCm? My most important question, is possible it from windows and WSL or WSL2... Can u help me please? – Milor123 Jul 05 '23 at 11:10
  • To be honestly around the AI stuff I'm not too sure, don't use WSL as it's the old version, WSL2 has a lot more support and is more like a native linux experience. But I've wasted hours trying to get things to work on WSL, I'd advise just doing a dual boot set up and running everything off a native linux install. TLDR: Nothing beats a native linux install. Do that if you can. – Joel Gray Jul 06 '23 at 13:23
0

Looks like this is a known bug at the moment with WSL / Torch: https://github.com/pytorch/pytorch/issues/73487

From what I can see (and tested):

I can confirm the same. If you're stuck, try uninstalling and reinstalling your wsl2 distribution (e.g., Ubuntu). Make sure you've installed the Nvidia driver on the Windows side (follow the official wsl2 setup docs). Let conda manage cudatoolkit for you; don't follow Nvidia's guide for installing cudatoolkit system-wide.

Seems to be all you can do. I would suggest reinstalling your WSL install and seeing if that helps then seting everything up again exactly as per the docs (as that is what Microsoft is most likelly to test). If this doesn't help, my final suggestion would be to dual boot if that is an option for you.

(note that the nvidia does not apply to you in the quote, however cuda should still be available to you)

Ethan
  • 160
  • 2
  • 10
  • Hi Ethan, so this doesnt solve my problem and also Cuda is only available to Nvidia hardware. I reinstalled a fresh ubuntu instance on a spare ssd for dual boot. Tried everything again and still no luck, so the issue isn’t WSL. The issue I think was ROCm not installed correctly. My AMD GPU now works with blender for example using OpenGL. But the issue now is theres no support in the Dalle playground program for AMD GPUs. So its not a system issue its an issue with the dalle program/repo. Thanks for your help anyway though! – Joel Gray Jun 18 '22 at 18:38
  • https://math.dartmouth.edu/~sarunas/amdgpu.html – Ethan Jun 19 '22 at 01:12