5

If I am using --scale-tier BASIC GPU within a Google Cloud ML Engine job, how can I view the GPU utilization? I am able to view CPU Utilization and Memory utilization on the "job details" tab, but I'm wondering how much the GPU is being utilized. Is this just contained within CPU usage or is there another tab to look at GPU utilization?

Additionally, are there any ways to view which ops are taking up most of the CPU usage? My CPU utilization is very high, my memory is very low and my input producer is always full (100%) so I'm trying to get a better understanding of where the time is being spent so that I can try to optimize my model performance.

reese0106
  • 2,011
  • 2
  • 16
  • 46
  • You could take a look at [gcloud ml-engine local](https://cloud.google.com/sdk/gcloud/reference/ml-engine/local/) to run your training on your local instance to do the analysis there (perhaps with a subset of the data), that's more profile-friendly than ml-engine. – amo-ej1 Aug 23 '17 at 06:08
  • Any suggestions on how to go about profiling? Any blog posts or relevant resources? – reese0106 Aug 23 '17 at 13:04

2 Answers2

4

There is currently no way to see GPU utilization with Cloud ML Engine.

TensorFlow has a feature called timelines which can be used to obtain profile data. Here's a blog post describing how to use it.

Jeremy Lewi
  • 6,386
  • 6
  • 22
  • 37
2

watch -n 0.5 nvidia-smi can be used from the command line to see NVIDIA GPU usage.

Stephane Bersier
  • 710
  • 7
  • 20