The original question on this post was: How to get Keras and Tensorflow to run with an AMD GPU.
The answer to this question is as followed:
1.) Keras will work if you can make Tensorflow work correctly (optionally within your virtual/conda environment).
2.) To get Tensorflow to work on an AMD GPU, as others have stated, one way this could work is to compile Tensorflow to use OpenCl. To do so read the link below. But for brevity I will summarize the required steps here:
You will need AMDs proprietary drivers. These are currently only available on Ubuntu 14.04 (the version before Ubuntu decided to change the way the UI is rendered). Support for Ubuntu 16.04 is at the writing of this post limited to a few GPUs through AMDProDrivers. Readers who want to do deep learning on AMD GPUs should be aware of this!
Compiling Tensorflow with OpenCl support also requires you to obtain and install the following prerequisites: OpenCl headers, ComputeCpp.
After the prerequisites are fulfilled, configure your build. Note that there are 3 options for compiling Tensorflow: Std Tensorflow (stable), Benoits Steiner's Tensorflow-opencl (developmental), and Luke Iwanski's Tensorflow-opencl (highly experimental) which you can pull from github. Also note that if you decide to build from any of the opencl versions, the question to use opencl will be missing because it is assumed that you are using it. Conversely, this means that if you configure from the standard tensorflow, you will need to select "Yes" when the configure script asks you to use opencl and "NO" for CUDA.
Then run tests like so:
$ bazel test --config=sycl -k --test_timeout 1600 -- //tensorflow/...
-//tensorflow/contrib/... -//tensorflow/java/... -//tensorflow
/compiler/...
Update: Doing this on my setup takes exceedingly long on my setup. The part that takes long are all the tests running. I am not sure what this means but a lot of my tests are timeing out at 1600 seconds. The duration can probably be shortened at the expense of more tests timeing out. Alternatively, you can just build tensor flow without tests. At the time of this writing, running the tests has taken 2 days already.
Or just build the pip package like so:
bazel build --local_resources 2048,.5,1.0 -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package
Please actually read the blog post over at Codeplay: Lukas Iwansky posted a comprehensive tutorial post on how to get Tensorflow to work with OpenCl just on March 30th 2017. So this is a very recent post. There are also some details which I did not write about here.
As indicated in the many posts above, little bits of information are spread throughout the interwebs. What Lukas' post adds in terms of value is that all the information was put together into one place which should make setting up Tensforflow and OpenCl a bit less daunting. I will only provide a link here:
https://www.codeplay.com/portal/03-30-17-setting-up-tensorflow-with-opencl-using-sycl
A slightly more complete walk-through has been posted here:
http://deep-beta.co.uk/setting-up-tensorflow-with-opencl-using-sycl/
It differs mainly by explicitly telling the user that he/she needs to:
- create symlinks to a subfolder
- and then actually install tensorflow via "python setup.py develop" command.
Note an alternative approach was mentioned above using tensorflow-cl:
https://github.com/hughperkins/tensorflow-cl
I am unable to discern which approach is better at this time though it appears that this approach is less active. Fewer issues are posted, and fewer conversations to resolve those issues are happening. There was a major push last year. Additional pushes have ebbed off since November 2016 although Hugh seems to have pushed some updates a few days ago as of the writing of this post. (Update: If you read some of the documentation readme, this version of tensorflowo now only relies on community support as the main developer is busy with life.)
UPDATE (2017-04-25): I have some notes based on testing tensorflow-opencl below.
- The future user of this package should note that using opencl means that all the heavy-lifting in terms of computing is shifted to the GPU. I mention this because I was personally thinking that the compute work-load would be shared between my CPU and iGPU. This means that the power of your GPU is very important (specifically, bandwidth, and available VRAM).
Following are some numbers for calculating 1 epoch using the CIFAR10 data set for MY SETUP (A10-7850 with iGPU). Your mileage will almost certainly vary!
- Tensorflow (via pip install): ~ 1700 s/epoch
- Tensorflow (w/ SSE + AVX): ~ 1100 s/epoch
- Tensorflow (w/ opencl & iGPU): ~ 5800 s/epoch
You can see that in this particular case performance is worse. I attribute this to the following factors:
- The iGPU only has 1GB. This leads to a lot of copying back and forth between CPU and GPU. (Opencl 1.2 does not have the ability to data pass via pointers yet; instead data has to be copied back and forth.)
- The iGPU only has 512 stream processors (and 32 Gb/s memory bandwidth) which in this case is slower than 4 CPUs using SSE4 + AVX instruction sets.
- The development of tensorflow-opencl is in it's beginning stages, and a lot of optimizations in SYCL etc. have not been done yet.
If you are using an AMD GPU with more VRAM and more stream processors, you are certain to get much better performance numbers. I would be interested to read what numbers people are achieving to know what's possible.
I will continue to maintain this answer if/when updates get pushed.
3.) An alternative way is currently being hinted at which is using AMD's RocM initiative, and miOpen (cuDNN equivalent) library. These are/will be open-source libraries that enable deep learning. The caveat is that RocM support currently only exists for Linux, and that miOpen has not been released to the wild yet, but Raja (AMD GPU head) has said in an AMA that using the above, it should be possible to do deep learning on AMD GPUs. In fact, support is planned for not only Tensorflow, but also Cafe2, Cafe, Torch7 and MxNet.