2

Short version: Tensorflow Keras model is training much slower on Tesla T4 than on GTX 970 (both are working, checked with nvidia-smi).

Long version: I have two host machines. One is PC with GTX 970 with Tensorflow 2.1.0, the other is GCP AI Platform Notebook with Tesla T4 with Tensorflow 2.3.0. I am running the same code on both of them, all training data is stored in RAM as numpy arrays, dtype is the same ('float16'), batch_size is also the same (8, GTX wouldn't work with anything above that, but I have also tried 64 on Tesla, didn't make any difference). Why is Tesla, with twice computing power of GTX, working about 2.5 times slower? How can I use my GPU correctly to train my models faster?

krzysztofs
  • 33
  • 5

1 Answers1

0

First of all, for a fair comparison, you have to use the same exact version of TensorFlow, since there have been reports of performance differences in the TensorFlow versions Why is TensorFlow 2 much slower than TensorFlow 1?. Note that the post contains comparisons also between TF 2.1 vs TF 2.2 vs TF 2.3

The post above also has recommendations on how to prepare + fit for a good performance.

Second of all, it may be the case that the CPU preprocessing is slower on a machine as compared to another one, hence another possible source of difference.

Timbus Calin
  • 13,809
  • 5
  • 41
  • 59