How to make best use of GPU for TensorFlow Estimators?

Question

I was using Tensorflow(CPU version) for my Deep Learning Model. Specifically using DNNRegressor Estimator for training, with given set of parameters (network structure, hidden layers, alpha etc.) Though I was able to reduce the loss, but model took very large time for learning (approx 3 days.) and time it was taking was 9 sec per 100th step.

I came accross this article :- https://medium.com/towards-data-science/how-to-traine-tensorflow-models-79426dabd304 and found that GPU's can be more faster to learn. So i took p2.xlarge gpu from AWS (single core GPU) with 4(vCPU), 12(ECU) and 61 (MiB).

But the learning rate is same at 9 sec per 100th step. I m using same code I used for Estimators on CPU, because I read that Estimators use GPU on their own. Here is my "nvidia-smi" command output.

It is showing that GPU Memory being used, but my Volatile GPU-Util is at 1%. Not able to figure out, what I am missing out. Is it designed to work same or I m missing something, because global steps per sec is same for both CPU and GPU implementation of Estimators.
Do I have to explicitly change something in DNNRegressor Estimator Code?

It takes some time for tensorflow to start, how long have you checked on the process ? — Kev1n91, Oct 09 '17 at 14:30
@kev1n91 This time is constant for all the steps. So this much time is being used. — user3457384, Oct 10 '17 at 10:09
Looks like GPU is waiting for slow CPU operations. How do you feed the data in? — Maxim, Oct 10 '17 at 10:57
@maxim, its one time csv read and converting them to pandas. After this conversion, I m using tf.feature_column function differently for numerical and categorical columns. And then giving array of those feature columns to input_fn in DNNRegressor estimator. — user3457384, Oct 10 '17 at 11:13
@user3457384 There are cases, when input batch preparation (which runs in CPU) is slower than GPU op. Can't say definitely without complete code. Please try to profile your operations as suggested in the following question and provide a summary - https://stackoverflow.com/questions/34293714/can-i-measure-the-execution-time-of-individual-operations-with-tensorflow/37774470#37774470 — Maxim, Oct 10 '17 at 12:19

score 1 · Answer 1 · answered Dec 23 '17 at 15:23

It sounds like you might be reading from csv and converting to a pandas DataFrame, and then using tensorflow's pandas_input_fn. This is a known issue with the implementation of pandas_input_fn. You can track the issue at https://github.com/tensorflow/tensorflow/issues/13530.

To combat this, you can use a different method for i/o (reading from TFRecords, for example). If you'd like to continue using pandas and increase your steps/second, you can reduce your batch_size, though this may have negative consequences on your estimator's ability to learn.

How to make best use of GPU for TensorFlow Estimators?

1 Answers1

Linked