How do you determine memory stats while using rapids.ai?

Question

I'm using python libraries of rapids.ai and one of the key things I'm starting to wonder is: how do I inspect memory allocation programatically? I know I can use nvidia-smi to look at some overall high level stats, but specifically I woud like to know:

1) Is there an easy way to find the memory footprint of a cudf dataframe (and other rapids objects?)

2) Is there a way for me to determine device memory available?

I'm sure there are plenty of ways for a C++ programmer to get these details but I'm hoping to find an answer that allows me to stay in Python.

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

5

1) Usage

All cudf objects should have the .memory_usage() method:

import cudf
x = cudf.DataFrame({'x': [1, 2, 3]})
x_usage = x.memory_usage(deep=True)
print(x_usage)

Out:

x        24
Index     0
dtype: int64

These values reflect GPU memory used.

2) Remaining

You can read the remaining available GPU memory with pynvml:

import pynvml

pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0) # Need to specify GPU
mem = pynvml.nvmlDeviceGetMemoryInfo(handle)
mem.free, mem.used, mem.total
(33500299264, 557973504, 34058272768)

Most GPU operations require a scratch buffer that is O(N), so you may run into RMM_OUT_OF_MEMORY errors if you end up with DataFrames or Series that are larger than your remaining available memory.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 06 '20 at 18:11

Thomson Comer

3,919
3
30
32

Thank you Thomson! `memory_usage` doesn't seem to be well documented (I couldn't find it at all in 0.11). As a side note I also noticed that if you try to run bad numba kernels (I was developing) that all operations can start reporting "out of memory". Any wisdom on how to detect if memory is in a corrupt state? Thanks again. – Robert Jan 08 '20 at 00:24
Hey @Robert! I'm sorry, `memory_usage` is shipping with 0.12! You can use the `cudf` nightly conda package or containers using the Rapids Release Selector at https://rapids.ai/start.html until 0.12 ships in a few weeks. The pace of development is high and we're always shipping new features: the nightly builds will rarely let you down. I assume that you're working out of a jupyter notebook? A bad kernel can corrupt the context. You can try to clean it up with numba: https://numba.pydata.org/numba-doc/dev/cuda-reference/host.html, but I usually manually restart my notebook server. – Thomson Comer Jan 08 '20 at 18:23

How do you determine memory stats while using rapids.ai?

1 Answers1

1) Usage

2) Remaining