How can I check/release GPU-memory in tensorflow 2.0b?

Question

In my tensorflow2.0b program I do get an error like this

    ResourceExhaustedError: OOM when allocating tensor with shape[727272703] and type int8 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:TopKV2]

The error occurs after a number of GPU-based operations within this program have been successfully executed.

I like to release all GPU-memory associated with these past operations in order to avoid the above error. How can I do this in tensorflow-2.0b? How could I check memory usage from within my program?

I was only able to find related information using tf.session() which is not available anymore in tensorflow2.0

Please take a look this [issue](https://stackoverflow.com/questions/42495930/tensorflow-oom-on-gpu) whih should help you in solving the issue. I guess if you reduce the batch size, this error should be taken care. — , Aug 20 '19 at 21:53

score 5 · Accepted Answer · answered Sep 19 '19 at 15:40

5

You might be interested in using this Python 3 Bindings for the NVIDIA Management Library.

I would try something like this:

import nvidia_smi

nvidia_smi.nvmlInit()

handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
# card id 0 hardcoded here, there is also a call to get all available card ids, so we could iterate

info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)

print("Total memory:", info.total)
print("Free memory:", info.free)
print("Used memory:", info.used)

nvidia_smi.nvmlShutdown()

answered Sep 19 '19 at 15:40

Fabien

76
5

`import nvidia_smi` not found this package. – huang Oct 30 '20 at 14:23
The linked github repo in this answer gives the installation instructions. With pip : pip install nvidia-ml-py3 It installed and imported fine for me. – user3810965 Apr 18 '21 at 00:15

How can I check/release GPU-memory in tensorflow 2.0b?

1 Answers1

Linked