Why different GPUs use different amounts of memory?

Question

I have 2 GPUs on different computers. One (NVIDIA A100) is on a server, the other (NVIDIA Quadro RTX 3000) is on my laptop. I watch the performance on both machines via nvidia-smi and noticed that the 2 GPUs use different amounts of memory when running the exact same processes (same code, same data, same CUDA version, same pytorch version, same drivers). I created a dummy script to verify this.

import torch
device = torch.device("cuda:0")
a = torch.ones((10000, 10000), dtype=float).to(device)

In nvidia-smi I can see how much memory is used for this specific python script:

A100: 1205 MiB
RTX 3000: 1651 MiB

However, when I query torch about memory usage I get the same values for both GPUs:

reserved = torch.cuda.memory_reserved(0)
allocated = torch.cuda.memory_allocated(0)

Both systems report the same usage:

reserved = 801112064 bytes (763 MiB)
allocated = 800000000 bytes (764 MiB)

I note that the allocated amount is much less than what I see used in nvidia-smi, though 763 MiB is equal to 100E6 float64 values.

Why does nvidia-smi report different memory usage on these 2 systems?

@Matthew Yes, this is the only card on the laptop, so it is handling everything. However, nvidia-smi breaks down memory usage by each process so I can see how much is used by the python code. I edited my question to reflect that detail. — tnknepp, Sep 14 '22 at 15:47
Rerun the example with the environment variable `PYTORCH_NO_CUDA_MEMORY_CACHING = 1`. Pytorch will overallocate space and do it's own memory management by default — Carson, Sep 14 '22 at 21:48
@Carson Sorry, I don't understand how that is set. Also, I think your suggesting will only affect the memory allocation within pytorch and the allocated/reserved amounts are already the same across computers. The issue may be in how nvidia-smi is reporting memory usage. — tnknepp, Sep 15 '22 at 16:00

Why different GPUs use different amounts of memory?

0 Answers0