Questions tagged [nvidia-smi]
43 questions
11
votes
5 answers
Failed to initialize NVML: Unknown Error in Docker after Few hours
I am having interesting and weird issue.
When I start docker container with gpu it works fine and I see all the gpus in docker. However, few hours or few days later, I can't use gpus in docker.
When I do nvidia-smi in docker machine. I see this…

Justin Song
- 111
- 1
- 4
7
votes
3 answers
Can not find NVIDIA driver after stop and start a deep learning VM
[TL;DR] First, wait for a couple of minutes and check if the Nvidia driver starts to work properly. If not, stop and start the VM instance again.
I created a Deep Learning VM (Google Click to Deploy) with an A100 GPU. After stopping and starting the…

zudi
- 141
- 1
- 6
6
votes
0 answers
Given the number of parameters, how to estimate the VRAM needed by a pytorch model?
I am trying to estimate the VRAM needed for a fully connected model without having to build/train the model in pytorch.
I got pretty close with this formula:
# params = number of parameters
# 1 MiB = 1048576 bytes
estimate = params * 24 /…

RDlady
- 378
- 2
- 16
3
votes
2 answers
What does the command "nvidia-smi --gpu-reset" do?
What does the command
sudo nvidia-smi --gpu-reset -i 0
do? Is it just freeing up the memory of GPU?
user14889957
2
votes
0 answers
GPU is used by Xwayland in Docker image
I'm currently trying to use a docker image for training of a generative adversarial network. Unfortunately, when I try to run the skript, I get the following error:
[2023-07-29 11:02:47 @__init__.py:80] Saving logging to file:…

Patschenkof
- 21
- 1
1
vote
1 answer
nvidia-smi vs torch.cuda.memory_allocated
I am checking the gpu memory usage in the training step.
To start with the main question, checking the gpu memory using the torch.cuda.memory_allocated method is different from checking with nvidia-smi. And I want to know why.
Actually, I measured…

core_not_dumped
- 759
- 2
- 22
1
vote
1 answer
Read GPU Information from Console C++
I want to create my own Overclocking Monitor for which I need to read information like the current voltage, clockspeeds and others.
In C++ I can easily get the Information from Nvidia-smi with typing for example:
console("nvidia-smi -q -i…

JackDerke
- 11
- 2
1
vote
2 answers
Nvidia driver is not recognized properly
OS:Ubuntu 20.04LTS
Windows10 dual boot
Error with nvidia-smi command after apt installation of nvidia driver.
$ nvidia-smi
Unable to determine the device handle for GPU 0000:0B:00.0: Not Found
$ dmesg |grep NVRM
[ 3.065144] NVRM: loading NVIDIA…

chess0000
- 31
- 1
- 3
1
vote
0 answers
Why different GPUs use different amounts of memory?
I have 2 GPUs on different computers. One (NVIDIA A100) is on a server, the other (NVIDIA Quadro RTX 3000) is on my laptop. I watch the performance on both machines via nvidia-smi and noticed that the 2 GPUs use different amounts of memory when…

tnknepp
- 5,888
- 6
- 43
- 57
1
vote
0 answers
is there way to know which container is using which gpu device?
Let say I have a docker container is running A,B,C and GPU 1,2,3.
I can check the gpu process ID with
nvidia-smi
some times container itself hold the gpu memory after it used up.
so I want to find which gpu container is running which gpu and…

jakeE
- 11
- 2
1
vote
0 answers
Technique to measure GPU utilization over a given period of time
We run an HPC cluster with GPUs. We would like to report the overall GPU utilization for the job. I know I can do it by periodically sampling in the background and doing the math. I was wondering if there was a tool where I could basically start…

William Allcock
- 134
- 2
- 9
1
vote
1 answer
watch command not working with special characters and quotes
watch -n 1 "paste <(ssh ai02 'nvidia-smi pmon -s um -c 1') <(ssh ai03 'nvidia-smi pmon -s um -c 1' )"
The above command is used to horizontally stack two server GPU stats together. It works without the watch command but get the following error
sh:…

JimmyJ
- 41
- 9
1
vote
1 answer
Most simplified form of the following regex / Extracting all values from nvidia-smi output
I am trying to analyze very large text string in Python containing nvidia-smi outputs but I really want to spend more time analyzing the data than working on my regex skills. I got the regex as follows but it takes forever in some rows (it might be…

elegantcomplexity
- 13
- 7
0
votes
1 answer
StableLM answers too slow on GCP VM with GPU
I installed StableLM on a GCP VM with these specs:
1 x NVIDIA Tesla P4, 8 vCPU - 30 GB memory.
And I set the model params llm_int8_enable_fp32_cpu_offload=True. But it takes too long to answer questions, ~8 minutes. It was faster even when using…

srls01
- 425
- 2
- 4
- 12
0
votes
0 answers
NVIDIA SMI shows lower CUDA version than NVCC
On an installation, when I run nvidia-smi, it shows the CUDA version as being 12.0.
After installing the CUDA Toolkit, nvcc --version reports the version is 12.2.
Is this a problem?
Based on this very comprehensive answer, I understood that NVIDIA…

ahron
- 803
- 6
- 29