I have 2 GPUs on different computers. One (NVIDIA A100) is on a server, the other (NVIDIA Quadro RTX 3000) is on my laptop. I watch the performance on both machines via nvidia-smi and noticed that the 2 GPUs use different amounts of memory when running the exact same processes (same code, same data, same CUDA version, same pytorch version, same drivers). I created a dummy script to verify this.
import torch
device = torch.device("cuda:0")
a = torch.ones((10000, 10000), dtype=float).to(device)
In nvidia-smi I can see how much memory is used for this specific python script:
- A100: 1205 MiB
- RTX 3000: 1651 MiB
However, when I query torch about memory usage I get the same values for both GPUs:
reserved = torch.cuda.memory_reserved(0)
allocated = torch.cuda.memory_allocated(0)
Both systems report the same usage:
- reserved = 801112064 bytes (763 MiB)
- allocated = 800000000 bytes (764 MiB)
I note that the allocated amount is much less than what I see used in nvidia-smi, though 763 MiB is equal to 100E6 float64 values.
Why does nvidia-smi report different memory usage on these 2 systems?