0

After a lot of struggles:

  1. Built successfully Tensorflow with OpenCL on fresh Ubuntu 16.04 with amdgpu 17.50.

  2. Have 5 identical GPUs (rx580) installed and all of them are reported by clinfo and computecpp_info as expected.

  3. Running the MNIST convnet example, TF works but emplyes just GPU0 without seeing other GPUs.

There are no errors reported in dmesg about the card, they seem all to be ready at the lowest layer, don't know why SYCL seems to be ignoring some cards.

Here is computecpp_info output:

********************************************************************************

ComputeCpp Info (CE 1.0.1)

SYCL 1.2.1 revision 3

********************************************************************************

Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 5 devices matching:
  platform  : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 2:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 3:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 4:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v1.0.1/platform-support-notes

********************************************************************************

Here the list from tensorflow:

$ python3 list_gpus.py
2018-10-17 23:52:44.268968: I ./tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-17 23:52:44.385308: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-10-17 23:52:44.385342: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5429869323017416982
, name: "/device:SYCL:0"
device_type: "SYCL"
memory_limit: 268435456
locality {
}
incarnation: 7347791393919061653
physical_device_desc: "id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE"
]

EDIT: After a reboot

I don't really do know if these warnings are relevant because they go away after the first run.

$ python3 list_gpus.py
2018-10-18 00:47:13.943021: I ./tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-18 00:47:13.952909: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:45] No OpenCL accelerator nor GPU found that is supported by ComputeCpp/triSYCL trying OpenCL CPU
2018-10-18 00:47:13.952930: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:52] No OpenCL CPU found that is supported by ComputeCpp/triSYCL, checking for host sycl device
2018-10-18 00:47:13.952936: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:59] Found SYCL host device
2018-10-18 00:47:13.953004: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-10-18 00:47:13.953014: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: Host, name: Host Device, vendor: Codeplay Software Ltd., profile: FULL_PROFILE

EDIT: dmesg details

[    0.000000] Linux version 4.15.0-36-generic (buildd@lcy01-amd64-017) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 (Ubuntu 4.15.0-36.39~16.04.1-generic 4.15.18)
[    0.688885] pcie_mp2_amd: AMD(R) PCI-E MP2 Communication Driver Version: 1.0
[    1.143085] [drm] amdgpu kernel modesetting enabled.
[    1.173931] amdgpu 0000:03:00.0: enabling device (0000 -> 0003)
[    1.564757] amdgpu 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    2.280211] amdgpu 0000:03:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    2.280212] amdgpu 0000:03:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    2.280322] [drm] amdgpu: 4096M of VRAM memory ready
[    2.280323] [drm] amdgpu: 4096M of GTT memory ready.
[    2.280427] amdgpu 0000:03:00.0: amdgpu: using MSI.
[    2.280439] [drm] amdgpu: irq initialized.
[    2.280452] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    2.280690] amdgpu 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    2.280758] amdgpu 0000:03:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    2.280784] amdgpu 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    2.280842] amdgpu 0000:03:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    2.280903] amdgpu 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    2.280965] amdgpu 0000:03:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    2.280985] amdgpu 0000:03:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    2.281001] amdgpu 0000:03:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    2.281015] amdgpu 0000:03:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    2.281028] amdgpu 0000:03:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    2.281332] amdgpu 0000:03:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    2.281348] amdgpu 0000:03:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    2.285039] amdgpu 0000:03:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    2.285056] amdgpu 0000:03:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    2.285069] amdgpu 0000:03:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    2.285578] amdgpu 0000:03:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    2.285594] amdgpu 0000:03:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    2.980155] amdgpu 0000:03:00.0: kfd not supported on this ASIC
[    2.980163] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:03:00.0 on minor 0
[    2.980215] amdgpu 0000:06:00.0: enabling device (0000 -> 0003)
[    4.068205] amdgpu 0000:06:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    4.068206] amdgpu 0000:06:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    4.068220] [drm] amdgpu: 4096M of VRAM memory ready
[    4.068221] [drm] amdgpu: 4096M of GTT memory ready.
[    4.068331] amdgpu 0000:06:00.0: amdgpu: using MSI.
[    4.068344] [drm] amdgpu: irq initialized.
[    4.068357] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    4.068444] amdgpu 0000:06:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    4.068509] amdgpu 0000:06:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    4.068571] amdgpu 0000:06:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    4.068639] amdgpu 0000:06:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    4.068665] amdgpu 0000:06:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    4.068718] amdgpu 0000:06:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    4.068740] amdgpu 0000:06:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    4.068759] amdgpu 0000:06:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    4.068774] amdgpu 0000:06:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    4.068787] amdgpu 0000:06:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    4.069074] amdgpu 0000:06:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    4.069094] amdgpu 0000:06:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    4.072854] amdgpu 0000:06:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    4.072868] amdgpu 0000:06:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    4.072881] amdgpu 0000:06:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    4.073362] amdgpu 0000:06:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    4.073376] amdgpu 0000:06:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    4.771466] amdgpu 0000:06:00.0: kfd not supported on this ASIC
[    4.771476] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:06:00.0 on minor 2
[    4.771515] amdgpu 0000:07:00.0: enabling device (0000 -> 0003)
[    5.856168] amdgpu 0000:07:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    5.856169] amdgpu 0000:07:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    5.856178] [drm] amdgpu: 4096M of VRAM memory ready
[    5.856179] [drm] amdgpu: 4096M of GTT memory ready.
[    5.856284] amdgpu 0000:07:00.0: amdgpu: using MSI.
[    5.856297] [drm] amdgpu: irq initialized.
[    5.856311] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    5.856402] amdgpu 0000:07:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    5.856441] amdgpu 0000:07:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    5.856464] amdgpu 0000:07:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    5.856541] amdgpu 0000:07:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    5.856569] amdgpu 0000:07:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    5.856641] amdgpu 0000:07:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    5.856668] amdgpu 0000:07:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    5.856690] amdgpu 0000:07:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    5.856707] amdgpu 0000:07:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    5.856722] amdgpu 0000:07:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    5.857007] amdgpu 0000:07:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    5.857027] amdgpu 0000:07:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    5.860789] amdgpu 0000:07:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    5.860803] amdgpu 0000:07:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    5.860817] amdgpu 0000:07:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    5.861298] amdgpu 0000:07:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    5.861313] amdgpu 0000:07:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    6.563837] amdgpu 0000:07:00.0: kfd not supported on this ASIC
[    6.563845] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:07:00.0 on minor 3
[    6.563887] amdgpu 0000:08:00.0: enabling device (0000 -> 0003)
[    7.648177] amdgpu 0000:08:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    7.648178] amdgpu 0000:08:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    7.648188] [drm] amdgpu: 4096M of VRAM memory ready
[    7.648188] [drm] amdgpu: 4096M of GTT memory ready.
[    7.648292] amdgpu 0000:08:00.0: amdgpu: using MSI.
[    7.648306] [drm] amdgpu: irq initialized.
[    7.648322] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    7.648406] amdgpu 0000:08:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    7.648470] amdgpu 0000:08:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    7.648530] amdgpu 0000:08:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    7.648593] amdgpu 0000:08:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    7.648649] amdgpu 0000:08:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    7.648707] amdgpu 0000:08:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    7.648733] amdgpu 0000:08:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    7.648751] amdgpu 0000:08:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    7.648769] amdgpu 0000:08:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    7.648782] amdgpu 0000:08:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    7.649069] amdgpu 0000:08:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    7.649087] amdgpu 0000:08:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    7.652849] amdgpu 0000:08:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    7.652862] amdgpu 0000:08:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    7.652874] amdgpu 0000:08:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    7.653353] amdgpu 0000:08:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    7.653366] amdgpu 0000:08:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    8.355909] amdgpu 0000:08:00.0: kfd not supported on this ASIC
[    8.355916] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:08:00.0 on minor 4
[    8.355957] amdgpu 0000:09:00.0: enabling device (0000 -> 0003)
[    9.440257] amdgpu 0000:09:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    9.440258] amdgpu 0000:09:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    9.440268] [drm] amdgpu: 4096M of VRAM memory ready
[    9.440268] [drm] amdgpu: 4096M of GTT memory ready.
[    9.440376] amdgpu 0000:09:00.0: amdgpu: using MSI.
[    9.440390] [drm] amdgpu: irq initialized.
[    9.440406] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    9.440499] amdgpu 0000:09:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    9.440563] amdgpu 0000:09:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    9.440625] amdgpu 0000:09:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    9.440690] amdgpu 0000:09:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    9.440753] amdgpu 0000:09:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    9.440808] amdgpu 0000:09:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    9.440831] amdgpu 0000:09:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    9.440849] amdgpu 0000:09:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    9.440865] amdgpu 0000:09:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    9.440880] amdgpu 0000:09:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    9.441167] amdgpu 0000:09:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    9.441184] amdgpu 0000:09:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    9.444946] amdgpu 0000:09:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    9.444964] amdgpu 0000:09:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    9.444976] amdgpu 0000:09:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    9.445456] amdgpu 0000:09:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    9.445469] amdgpu 0000:09:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[   10.147558] amdgpu 0000:09:00.0: kfd not supported on this ASIC
[   10.147564] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:09:00.0 on minor 5
[   10.147606] amdgpu 0000:0a:00.0: enabling device (0000 -> 0003)
[   11.232197] amdgpu 0000:0a:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[   11.232198] amdgpu 0000:0a:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[   11.232207] [drm] amdgpu: 4096M of VRAM memory ready
[   11.232207] [drm] amdgpu: 4096M of GTT memory ready.
[   11.232309] amdgpu 0000:0a:00.0: amdgpu: using MSI.
[   11.232322] [drm] amdgpu: irq initialized.
[   11.232337] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[   11.232427] amdgpu 0000:0a:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[   11.232488] amdgpu 0000:0a:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[   11.232551] amdgpu 0000:0a:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[   11.232615] amdgpu 0000:0a:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[   11.232675] amdgpu 0000:0a:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[   11.232699] amdgpu 0000:0a:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[   11.232717] amdgpu 0000:0a:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[   11.232735] amdgpu 0000:0a:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[   11.232749] amdgpu 0000:0a:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[   11.232763] amdgpu 0000:0a:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[   11.233048] amdgpu 0000:0a:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[   11.233067] amdgpu 0000:0a:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[   11.236830] amdgpu 0000:0a:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[   11.236848] amdgpu 0000:0a:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[   11.236860] amdgpu 0000:0a:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[   11.237341] amdgpu 0000:0a:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[   11.237355] amdgpu 0000:0a:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[   11.939330] amdgpu 0000:0a:00.0: kfd not supported on this ASIC
[   11.939336] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:0a:00.0 on minor 6

EDIT: It is not related to any specific card, just the first available in bus order.

I tried disconnecting some cards and after all the tests it seems clear that SYCL always lists just the first GPU, no matter which one, always the minimum available bus number.

This also confirms that there are no differences among the cards and that all of them can be used (at least individually), so the OS I think is fine and i would guess the problem is in SYCL.

Please help!

Gianks
  • 63
  • 1
  • 15
  • Have you seen: https://www.tensorflow.org/guide/using_gpu – Morrison Chang Oct 17 '18 at 22:56
  • I don't see this as a hardware problem at all since this setup works fine with opencl, seems more probable that the experimental support for opencl with tensorflow might be broken frankly. – Gianks Oct 17 '18 at 23:25
  • Yes i looked at it but i don't understand why is that relevant. To list the available gpus i used the code from this other answer: https://stackoverflow.com/questions/38559755/how-to-get-current-available-gpus-in-tensorflow – Gianks Oct 17 '18 at 23:27
  • @Rob And just for clarity, this question is a simple follow up of this one, which is not even mine: https://stackoverflow.com/questions/51688885/building-tensorflow-with-opencl-support-fails-on-ubuntu-18-04 – Gianks Oct 17 '18 at 23:33
  • This is not a Linux Issue! This issue is related with Tensorflow w/t OpenCL/SYCL. The hardware works fine. Drivers are fine. What else shall be fine for you to accept it is a software issue or a potential hardware limitation connected to that software and not a generic OS problem? – Gianks Oct 17 '18 at 23:39

1 Answers1

1

As today's date multiple GPUs with Tensorflow and OpenCL are currenly unsupported even if not clearly stated in the documentation.

You can track the details of the problem here, i opened an issue on Github: https://github.com/codeplaysoftware/tensorflow/issues/16

I'll update this answer if something changes but as the developer said this is not a priority for them!

Gianks
  • 63
  • 1
  • 15