Using Keras & Tensorflow with AMD GPU

Question

I'm starting to learn Keras, which I believe is a layer on top of Tensorflow and Theano. However, I only have access to AMD GPUs such as the AMD R9 280X.

How can I setup my Python environment such that I can make use of my AMD GPUs through Keras/Tensorflow support for OpenCL?

I'm running on OSX.

I believe the new [Theano backend](http://deeplearning.net/software/theano/tutorial/using_gpu.html) will support OpenCL as well as NVIDIA cards. As far as Tensorflow goes, there is an [open issue](https://github.com/tensorflow/tensorflow/issues/22) for OpenCL support; doesn't look like much progress has been made. — o-90, Jun 18 '16 at 04:12
There's no support for AMD GPUs in TensorFlow or most other neural network packages. The reason is that NVidia invested in fast free implementation of neural network blocks (CuDNN) which all fast implementations of GPU neural networks rely on (Torch/Theano/TF) while AMD doesn't seem to care about this market. — Yaroslav Bulatov, Jun 18 '16 at 21:51
Recently, Google announced that they would buy AMD GPU's for use in their data centers presumably for machine learning applications as well. Such a move does not make sense if there is not a roadmap to support gpus more generically. — Thornhale, Mar 25 '17 at 17:36
On most platforms (Mac/Win/Linux currently) you can run Keras on top of PlaidML. PlaidML is open source and includes an alternative to cuDNN that works on most GPUs: https://github.com/plaidml/plaidml — Choong Ng, Feb 19 '18 at 19:15
Easy way to install Opencl on Linux https://gist.github.com/kytulendu/3351b5d0b4f947e19df36b1ea3c95cbe — user1462442, Jul 26 '20 at 14:43
Check out plaidML mentioned below. I have it running on a 2010 Mac Pro with 4 GB AMMD GPU, a 2012 MacBook Pro with 1.5GB Nvidia GPU, and on a 2019 MacBook Pro with the 4GB AMD GPU. — Bryan Butler, Jul 30 '20 at 21:01

Hugh Perkins · Answer 1 · 2017-05-14T15:13:55.653

I'm writing an OpenCL 1.2 backend for Tensorflow at https://github.com/hughperkins/tensorflow-cl

This fork of tensorflow for OpenCL has the following characteristics:

it targets any/all OpenCL 1.2 devices. It doesnt need OpenCL 2.0, doesnt need SPIR-V, or SPIR. Doesnt need Shared Virtual Memory. And so on ...
it's based on an underlying library called 'cuda-on-cl', https://github.com/hughperkins/cuda-on-cl
- cuda-on-cl targets to be able to take any NVIDIA® CUDA™ soure-code, and compile it for OpenCL 1.2 devices. It's a very general goal, and a very general compiler
for now, the following functionalities are implemented:
- per-element operations, using Eigen over OpenCL, (more info at https://bitbucket.org/hughperkins/eigen/src/eigen-cl/unsupported/test/cuda-on-cl/?at=eigen-cl )
- blas / matrix-multiplication, using Cedric Nugteren's CLBlast https://github.com/cnugteren/CLBlast
- reductions, argmin, argmax, again using Eigen, as per earlier info and links
- learning, trainers, gradients. At least, StochasticGradientDescent trainer is working, and the others are commited, but not yet tested
it is developed on Ubuntu 16.04 (using Intel HD5500, and NVIDIA GPUs) and Mac Sierra (using Intel HD 530, and Radeon Pro 450)

This is not the only OpenCL fork of Tensorflow available. There is also a fork being developed by Codeplay https://www.codeplay.com , using Computecpp, https://www.codeplay.com/products/computesuite/computecpp Their fork has stronger requirements than my own, as far as I know, in terms of which specific GPU devices it works on. You would need to check the Platform Support Notes (at the bottom of hte computecpp page), to determine whether your device is supported. The codeplay fork is actually an official Google fork, which is here: https://github.com/benoitsteiner/tensorflow-opencl

I am wondering: what is the rational for building support for Opencl 1.2 only. There appear to be many features in Opencl 2.0 that could be useful for deep learning: http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-2-0-samples/ — Thornhale, Mar 25 '17 at 17:28
Has anyone without a dedicated GPU tested how much faster tensor flow gets when an integrated GPU (Intel or AMD) is used instead of only the CPU? — Thornhale, Mar 25 '17 at 17:43
@Thonhale rationale is: targeting portability. For example, Mac Sierra Radeon Pro 450 driver only supports OpenCL 1.2, as does the Intel HD 530 driver, on the same platform. (and this is a brand-new Mac Book Pro basically) — Hugh Perkins, May 14 '17 at 15:15

Thornhale · Answer 2 · 2017-05-19T21:02:03.230

The original question on this post was: How to get Keras and Tensorflow to run with an AMD GPU.

The answer to this question is as followed:

1.) Keras will work if you can make Tensorflow work correctly (optionally within your virtual/conda environment).

2.) To get Tensorflow to work on an AMD GPU, as others have stated, one way this could work is to compile Tensorflow to use OpenCl. To do so read the link below. But for brevity I will summarize the required steps here:

You will need AMDs proprietary drivers. These are currently only available on Ubuntu 14.04 (the version before Ubuntu decided to change the way the UI is rendered). Support for Ubuntu 16.04 is at the writing of this post limited to a few GPUs through AMDProDrivers. Readers who want to do deep learning on AMD GPUs should be aware of this!
Compiling Tensorflow with OpenCl support also requires you to obtain and install the following prerequisites: OpenCl headers, ComputeCpp.
After the prerequisites are fulfilled, configure your build. Note that there are 3 options for compiling Tensorflow: Std Tensorflow (stable), Benoits Steiner's Tensorflow-opencl (developmental), and Luke Iwanski's Tensorflow-opencl (highly experimental) which you can pull from github. Also note that if you decide to build from any of the opencl versions, the question to use opencl will be missing because it is assumed that you are using it. Conversely, this means that if you configure from the standard tensorflow, you will need to select "Yes" when the configure script asks you to use opencl and "NO" for CUDA.
Then run tests like so:

$ bazel test --config=sycl -k --test_timeout 1600 -- //tensorflow/... -//tensorflow/contrib/... -//tensorflow/java/... -//tensorflow /compiler/...

Update: Doing this on my setup takes exceedingly long on my setup. The part that takes long are all the tests running. I am not sure what this means but a lot of my tests are timeing out at 1600 seconds. The duration can probably be shortened at the expense of more tests timeing out. Alternatively, you can just build tensor flow without tests. At the time of this writing, running the tests has taken 2 days already.

Or just build the pip package like so:

bazel build --local_resources 2048,.5,1.0 -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package

Please actually read the blog post over at Codeplay: Lukas Iwansky posted a comprehensive tutorial post on how to get Tensorflow to work with OpenCl just on March 30th 2017. So this is a very recent post. There are also some details which I did not write about here.

As indicated in the many posts above, little bits of information are spread throughout the interwebs. What Lukas' post adds in terms of value is that all the information was put together into one place which should make setting up Tensforflow and OpenCl a bit less daunting. I will only provide a link here:

https://www.codeplay.com/portal/03-30-17-setting-up-tensorflow-with-opencl-using-sycl

A slightly more complete walk-through has been posted here:

http://deep-beta.co.uk/setting-up-tensorflow-with-opencl-using-sycl/

It differs mainly by explicitly telling the user that he/she needs to:

create symlinks to a subfolder
and then actually install tensorflow via "python setup.py develop" command.

Note an alternative approach was mentioned above using tensorflow-cl:

https://github.com/hughperkins/tensorflow-cl

I am unable to discern which approach is better at this time though it appears that this approach is less active. Fewer issues are posted, and fewer conversations to resolve those issues are happening. There was a major push last year. Additional pushes have ebbed off since November 2016 although Hugh seems to have pushed some updates a few days ago as of the writing of this post. (Update: If you read some of the documentation readme, this version of tensorflowo now only relies on community support as the main developer is busy with life.)

UPDATE (2017-04-25): I have some notes based on testing tensorflow-opencl below.

The future user of this package should note that using opencl means that all the heavy-lifting in terms of computing is shifted to the GPU. I mention this because I was personally thinking that the compute work-load would be shared between my CPU and iGPU. This means that the power of your GPU is very important (specifically, bandwidth, and available VRAM).

Following are some numbers for calculating 1 epoch using the CIFAR10 data set for MY SETUP (A10-7850 with iGPU). Your mileage will almost certainly vary!

Tensorflow (via pip install): ~ 1700 s/epoch
Tensorflow (w/ SSE + AVX): ~ 1100 s/epoch
Tensorflow (w/ opencl & iGPU): ~ 5800 s/epoch

You can see that in this particular case performance is worse. I attribute this to the following factors:

The iGPU only has 1GB. This leads to a lot of copying back and forth between CPU and GPU. (Opencl 1.2 does not have the ability to data pass via pointers yet; instead data has to be copied back and forth.)
The iGPU only has 512 stream processors (and 32 Gb/s memory bandwidth) which in this case is slower than 4 CPUs using SSE4 + AVX instruction sets.
The development of tensorflow-opencl is in it's beginning stages, and a lot of optimizations in SYCL etc. have not been done yet.

If you are using an AMD GPU with more VRAM and more stream processors, you are certain to get much better performance numbers. I would be interested to read what numbers people are achieving to know what's possible.

I will continue to maintain this answer if/when updates get pushed.

3.) An alternative way is currently being hinted at which is using AMD's RocM initiative, and miOpen (cuDNN equivalent) library. These are/will be open-source libraries that enable deep learning. The caveat is that RocM support currently only exists for Linux, and that miOpen has not been released to the wild yet, but Raja (AMD GPU head) has said in an AMA that using the above, it should be possible to do deep learning on AMD GPUs. In fact, support is planned for not only Tensorflow, but also Cafe2, Cafe, Torch7 and MxNet.

I'd love to see AMD get in on action - personally I have all AMD cards - but we've been told _Soon_ too long now (not that AMD has all the control here as to whether TensorFlow, etc implement it). Does AMD have any 'issue tracker' equivalent that you know of by chance? — ehiller, Aug 26 '17 at 21:42
Do you know how the _scene_ changed since you wrote that answer? — Maxie Berkmann, Sep 13 '20 at 08:56

score 28 · Answer 3 · answered Jan 01 '19 at 21:13

One can use AMD GPU via the PlaidML Keras backend.

Fastest: PlaidML is often 10x faster (or more) than popular platforms (like TensorFlow CPU) because it supports all GPUs, independent of make and model. PlaidML accelerates deep learning on AMD, Intel, NVIDIA, ARM, and embedded GPUs.

Easiest: PlaidML is simple to install and supports multiple frontends (Keras and ONNX currently)

Free: PlaidML is completely open source and doesn't rely on any vendor libraries with proprietary and restrictive licenses.

For most platforms, getting started with accelerated deep learning is as easy as running a few commands (assuming you have Python (v2 or v3) installed):

virtualenv plaidml
source plaidml/bin/activate
pip install plaidml-keras plaidbench

Choose which accelerator you'd like to use (many computers, especially laptops, have multiple):

plaidml-setup

Next, try benchmarking MobileNet inference performance:

plaidbench keras mobilenet

Or, try training MobileNet:

plaidbench --batch-size 16 keras --train mobilenet

To use it with keras set

os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"

For more information

https://github.com/plaidml/plaidml

https://github.com/rstudio/keras/issues/205#issuecomment-348336284

I use plaidML all the time and have done some benchmarking depending on GPU. It works quite well on all of my Macs. — Bryan Butler, Jul 30 '20 at 20:51
These three command work at the top of every script: `import plaidml.keras` `plaidml.keras.import_backend()` `from keras import backend as K` — Bryan Butler, Jul 30 '20 at 20:54
I get an error, fixed via https://github.com/plaidml/plaidml/issues/1027#issuecomment-617032218 — jtlz2, Jul 16 '21 at 09:29

score 7 · Answer 4 · answered Aug 31 '17 at 06:17

This is an old question, but since I spent the last few weeks trying to figure it out on my own:

OpenCL support for Theano is hit and miss. They added a libgpuarray back-end which appears to still be buggy (i.e., the process runs on the GPU but the answer is wrong--like 8% accuracy on MNIST for a DL model that gets ~95+% accuracy on CPU or nVidia CUDA). Also because ~50-80% of the performance boost on the nVidia stack comes from the CUDNN libraries now, OpenCL will just be left in the dust. (SEE BELOW!) :)
ROCM appears to be very cool, but the documentation (and even a clear declaration of what ROCM is/what it does) is hard to understand. They're doing their best, but they're 4+ years behind. It does NOT NOT NOT work on an RX550 (as of this writing). So don't waste your time (this is where 1 of the weeks went :) ). At first, it appears ROCM is a new addition to the driver set (replacing AMDGPU-Pro, or augmenting it), but it is in fact a kernel module and set of libraries that essentially replace AMDGPU-Pro. (Think of this as the equivalent of Nvidia-381 driver + CUDA some libraries kind of). https://rocm.github.io/dl.html (Honestly I still haven't tested the performance or tried to get it to work with more recent Mesa drivers yet. I will do that sometime.
Add MiOpen to ROCM, and that is essentially CUDNN. They also have some pretty clear guides for migrating. But better yet.
They created "HIP" which is an automagical translator from CUDA/CUDNN to MiOpen. It seems to work pretty well since they lined the API's up directly to be translatable. There are concepts that aren't perfect maps, but in general it looks good.

Now, finally, after 3-4 weeks of trying to figure out OpenCL, etc, I found this tutorial to help you get started quickly. It is a step-by-step for getting hipCaffe up and running. Unlike nVidia though, please ensure you have supported hardware!!!! https://rocm.github.io/hardware.html. Think you can get it working without their supported hardware? Good luck. You've been warned. Once you have ROCM up and running (AND RUN THE VERIFICATION TESTS), here is the hipCaffe tutorial--if you got ROCM up you'll be doing an MNIST validation test within 10 minutes--sweet! https://rocm.github.io/ROCmHipCaffeQuickstart.html

A link to a solution is welcome, but please ensure your answer is useful without it: [add context around the link](//meta.stackexchange.com/a/8259) so your fellow users will have some idea what it is and why it’s there, then quote the most relevant part of the page you're linking to in case the target page is unavailable. [Answers that are little more than a link may be deleted.](//stackoverflow.com/help/deleted-answers) — Papershine, Aug 31 '17 at 08:21
FYI, the pages you linked don't exist anymore and redirect somewhere else which don't fully contain the answers. — Maxie Berkmann, Sep 13 '20 at 09:00
Let me review the latest status over the next week to update the answer. — Thornhale, Sep 29 '20 at 18:03

score 5 · Answer 5 · answered Jun 19 '16 at 03:26

5

Theano does have support for OpenCL but it is still in its early stages. Theano itself is not interested in OpenCL and relies on community support.

Most of the operations are already implemented and it is mostly a matter of tuning and optimizing the given operations.

To use the OpenCL backend you have to build libgpuarray yourself.

From personal experience I can tell you that you will get CPU performance if you are lucky. The memory allocation seems to be very naively implemented (therefore computation will be slow) and will crash when it runs out of memory. But I encourage you to try and maybe even optimize the code or help reporting bugs.

answered Jun 19 '16 at 03:26

nemo

55,207
13
135
135

4

Has anything changed over the last 6 months in this regard? – Thornhale Mar 31 '17 at 13:49
5

Theano was discontinued – Oct 19 '17 at 07:11
@ErikAigner Offically. Bugs are still fixed and the community is able to contribute. – nemo Oct 19 '17 at 12:42
1

Indeed, [Theano was discontinued](https://groups.google.com/forum/#!topic/theano-users/7Poq8BZutbY). – Josiah Yoder Jun 26 '18 at 15:31

score 5 · Answer 6 · answered Mar 25 '18 at 05:01

5

Tensorflow 1.3 has been supported on AMD ROCm stack:

https://github.com/ROCmSoftwarePlatform/tensorflow

A pre-built docker image has also been posted publicly:

https://hub.docker.com/r/rocm/tensorflow/

answered Mar 25 '18 at 05:01

user1917768

59
1
1

Kruft Industries · Answer 7 · 2017-10-30T08:03:43.737

If you have access to other AMD gpu's please see here: https://github.com/ROCmSoftwarePlatform/hiptensorflow/tree/hip/rocm_docs

This should get you going in the right direction for tensorflow on the ROCm platform, but Selly's post about https://rocm.github.io/hardware.html is the deal with this route. That page is not an exhaustive list, I found out on my own that the Xeon E5 v2 Ivy Bridge works fine with ROCm even though they list v3 or newer, graphics cards however are a bit more picky. gfx8 or newer with a few small exceptions, polaris and maybe others as time goes on.

UPDATE - It looks like hiptensorflow has an option for opencl support during configure. I would say investigate the link even if you don't have gfx8+ or polaris gpu if the opencl implementation works. It is a long winded process but an hour or three (depending on hardware) following a well written instruction isn't too much to lose to find out.

score -2 · Answer 8 · answered Nov 20 '20 at 13:04

Technically you can if you use something like OpenCL, but Nvidia's CUDA is much better and OpenCL requires other steps that may or may not work. I would recommend if you have an AMD gpu, use something like Google Colab where they provide a free Nvidia GPU you can use when coding.

Using Keras & Tensorflow with AMD GPU

8 Answers8

Linked