Tensorflow not fully utilizing GPU in GPT-2 program

Question

I am running the GPT-2 code of the large model(774M). It is used for the generation of text samples through interactive_conditional_samples.py , link: here

So I've given an input file containing prompts which are automatically selected to generate output. This output is also automatically copied into a file. In short, I'm not training it, I'm using the model to generate text. Also, I'm using a single GPU.

The problem I'm facing in this is, The code is not utilizing the GPU fully.

By using nvidia-smi command, I was able to see the below image

https://i.stack.imgur.com/f02p7.jpg

Unable to see the code `interactive_conditional_samples.py`. It says page not found on github. — Sharath, Dec 03 '19 at 05:45
Hi, thanks for the comment. I have updated the post with the link. — amateur, Dec 03 '19 at 05:51
Can you run the same with watch command in nvidia-smi. Like so `watch -n 0.25 nvidia-smi`. Just look if the GPU usage has any movement. — Sharath, Dec 03 '19 at 09:53
Can you check if tensorflow-gpu version is installed and not the cpu version. https://stackoverflow.com/questions/38009682/how-to-tell-if-tensorflow-is-using-gpu-acceleration-from-inside-python-shell?answertab=votes#tab-top — Sharath, Dec 04 '19 at 04:57
The tensorflow-gpu v 1.14 is installed and a quick look at nvidia-smi tells that the gpu is being utilised. However, the utilisation is very low. — amateur, Dec 04 '19 at 07:46

score 1 · Answer 1 · answered May 20 '20 at 12:28

It depends on your application. It is not unusual to have low GPU utilization when the batch_size is small. Try increasing the batch_size for more GPU utilization.

In your case, you have set batch_size=1 in your program. Increase the batch_size to a larger number and verify the GPU utilization.

Let me explain using MNIST size networks. They are tiny and it's hard to achieve high GPU (or CPU) efficiency for them. You will get higher computational efficiency with larger batch size, meaning you can process more examples per second, but you will also get lower statistical efficiency, meaning you need to process more examples total to get to target accuracy. So it's a trade-off. For tiny character models, the statistical efficiency drops off very quickly after a batch_size=100, so it's probably not worth trying to grow the batch size for training. For inference, you should use the largest batch size you can.

Hope this answers your question. Happy Learning.

@amateur - Hope we have answered your question. Can you please accept and upvote the answer if you are satisfied with the answer. — , May 21 '20 at 08:15

Tensorflow not fully utilizing GPU in GPT-2 program

1 Answers1