-1

I have a pc (i5,16G RAM) with windows 10 and 1080ti gpu. I've installed TF 1.4 , python anaconda 3.6 , cuda 8.0 and cuDNN v6.0.

I'm training ssd mobilenet object detector according to dtran's. The training runs slower than expected:

INFO:tensorflow:global step 14463: loss = 1.1131 (2.125 sec/step)

INFO:tensorflow:global step 14464: loss = 1.1103 (2.094 sec/step)

INFO:tensorflow:global step 14465: loss = 0.8764 (2.141 sec/step)

INFO:tensorflow:global step 14466: loss = 0.9378 (2.391 sec/step)

How can i tell if everything is working well and this the expected performance or if there's a problem ? is there a benchmark tool for TF u can just download, run and compare the results to expected ones ? Will a migration to ubuntu improve the results ?

talonmies
  • 70,661
  • 34
  • 192
  • 269
asaf oron
  • 31
  • 4
  • 1
    Are you confident you’re running on your GPU? Do you have a gpu-enabled build, like tensorflow-gpu? Have you examined your device placement to make sure computational ops are on GPU? – Joshua R. Jan 23 '18 at 10:23
  • related: https://stackoverflow.com/a/43703735/4132383 – sladomic Jan 23 '18 at 10:33
  • First, check device placement. Second, check that CPU is not the bottleneck, e.g., in the data pipeline – Maxim Jan 23 '18 at 10:55
  • Check the GPU usage with tools such as GPU-Z. More often than not, the problem is your input data pipeline isn't fast enough to feed the GPU at maximum capacity – GPhilo Jan 23 '18 at 11:28
  • Maxim. I verified that the training runs on the gpu with gpu-z. the gpu is around 25% on average but the cpu is around 90-100%. cpu memory usage is aorund 6.5GB. Does that means that the cpu is the bottleneack and is there some way to tweak it ? i have i5-7600 3.9 GHz – asaf oron Jan 23 '18 at 20:45
  • I could not find a ready training script that you are using. Here are some general suggestions.See this question and references from there: https://stackoverflow.com/questions/48351883/gpu-under-utilization-using-tensorflow-dataset, https://stackoverflow.com/questions/46965098/how-does-one-move-data-to-multiple-gpu-towers-using-tensorflows-dataset-api. Also see https://www.tensorflow.org/performance/performance_guide – iga Feb 20 '18 at 05:42

1 Answers1

-2

I hope your problem is solved by this time, if not ... I had a similar problem when first set up and ran my programs in windows, later i found out that tensorflow gpu version in windows requires python 3.5 but not 3.6. I guess your program is using CPU rather than GPU. If you are using an IDE like Pycharm, select the tensorflow GPU env as python interpretor. When you run your python program it should show some thing like "/gpu:0": The GPU of your machine.

raghu
  • 83
  • 2
  • 12