I was interested in testing my neural net (an Autoencoder that serves as a generator + a CNN as a discriminator) that uses 3dconv/deconv layers with the new Volta architecture and benefit from the Mixed-Precision training. I compiled the most recent source code of Tensorflow 1.4 with CUDA 9 and CudNN 7.0 and cast all the trainable variables used by my conv/deconv layers to tf.float16. Also, all my input and output tensors have sizes that are multiple of 8.
Unfortunately, I do not see any substantial speed improvement with this configuration, the training time is roughly similar to when using tf.float32. My understanding is that with the Volta architecture and cuDNN 7.0, Mixed Precision should be automatically detected by TF and hence enable the use of Tensor Core math. Am I wrong, or is there anything I should do to enable it? I also tried the TF1.5 nighlty build, and it seems that it is even slower than my custom 1.4.
I would appreciate if any dev involved in Tensorflow could answer this.
EDIT: After talking with NVIDIA tech support, it seems that, while supporting float16,TF integrates mixed-precision acceleration for simple 2D conv Ops, but not for 3D conv Ops as of now.