Highest Voted 'nvidia-smi' Questions

11

votes

5 answers

Failed to initialize NVML: Unknown Error in Docker after Few hours

I am having interesting and weird issue. When I start docker container with gpu it works fine and I see all the gpus in docker. However, few hours or few days later, I can't use gpus in docker. When I do nvidia-smi in docker machine. I see this…

asked Jul 11 '22 at 01:28

Justin Song

111
1
4

7

votes

3 answers

Can not find NVIDIA driver after stop and start a deep learning VM

[TL;DR] First, wait for a couple of minutes and check if the Nvidia driver starts to work properly. If not, stop and start the VM instance again. I created a Deep Learning VM (Google Click to Deploy) with an A100 GPU. After stopping and starting the…

google-cloud-platform debian nvidia nvidia-smi

asked Mar 24 '22 at 03:57

zudi

141
1
6

6

votes

0 answers

Given the number of parameters, how to estimate the VRAM needed by a pytorch model?

I am trying to estimate the VRAM needed for a fully connected model without having to build/train the model in pytorch. I got pretty close with this formula: # params = number of parameters # 1 MiB = 1048576 bytes estimate = params * 24 /…

memory pytorch vram nvidia-smi

asked Oct 22 '21 at 18:29

RDlady

378
2
16

3

votes

2 answers

What does the command "nvidia-smi --gpu-reset" do?

What does the command sudo nvidia-smi --gpu-reset -i 0 do? Is it just freeing up the memory of GPU?

gpu nvidia nvidia-smi

asked Nov 16 '21 at 09:03

user14889957

2

votes

0 answers

GPU is used by Xwayland in Docker image

I'm currently trying to use a docker image for training of a generative adversarial network. Unfortunately, when I try to run the skript, I get the following error: [2023-07-29 11:02:47 @__init__.py:80] Saving logging to file:…

python linux docker nvidia-smi xwayland

asked Jul 29 '23 at 12:30

Patschenkof

21
1

1

vote

1 answer

nvidia-smi vs torch.cuda.memory_allocated

I am checking the gpu memory usage in the training step. To start with the main question, checking the gpu memory using the torch.cuda.memory_allocated method is different from checking with nvidia-smi. And I want to know why. Actually, I measured…

python pytorch gpu nvidia-smi vram

asked Jan 03 '23 at 07:01

core_not_dumped

759
2
22

1

vote

1 answer

Read GPU Information from Console C++

I want to create my own Overclocking Monitor for which I need to read information like the current voltage, clockspeeds and others. In C++ I can easily get the Information from Nvidia-smi with typing for example: console("nvidia-smi -q -i…

c++ nvidia nvidia-smi

asked Dec 13 '22 at 17:15

JackDerke

11
2

1

vote

2 answers

Nvidia driver is not recognized properly

OS:Ubuntu 20.04LTS Windows10 dual boot Error with nvidia-smi command after apt installation of nvidia driver. $ nvidia-smi Unable to determine the device handle for GPU 0000:0B:00.0: Not Found $ dmesg |grep NVRM [ 3.065144] NVRM: loading NVIDIA…

ubuntu nvidia nvidia-smi

asked Oct 27 '22 at 10:16

chess0000

31
1
3

1

vote

0 answers

Why different GPUs use different amounts of memory?

I have 2 GPUs on different computers. One (NVIDIA A100) is on a server, the other (NVIDIA Quadro RTX 3000) is on my laptop. I watch the performance on both machines via nvidia-smi and noticed that the 2 GPUs use different amounts of memory when…

pytorch gpu nvidia-smi

asked Sep 14 '22 at 15:29

tnknepp

5,888
6
43
57

1

vote

0 answers

is there way to know which container is using which gpu device?

Let say I have a docker container is running A,B,C and GPU 1,2,3. I can check the gpu process ID with nvidia-smi some times container itself hold the gpu memory after it used up. so I want to find which gpu container is running which gpu and…

docker gpu nvidia-smi

asked Aug 29 '22 at 02:40

jakeE

11
2

1

vote

0 answers

Technique to measure GPU utilization over a given period of time

We run an HPC cluster with GPUs. We would like to report the overall GPU utilization for the job. I know I can do it by periodically sampling in the background and doing the math. I was wondering if there was a tool where I could basically start…

gpu performance-measuring nvidia-smi

asked Aug 17 '22 at 13:37

William Allcock

134
2
9

1

vote

1 answer

watch command not working with special characters and quotes

watch -n 1 "paste <(ssh ai02 'nvidia-smi pmon -s um -c 1') <(ssh ai03 'nvidia-smi pmon -s um -c 1' )" The above command is used to horizontally stack two server GPU stats together. It works without the watch command but get the following error sh:…

bash awk pipe watch nvidia-smi

asked Sep 11 '21 at 19:01

JimmyJ

41
9

1

vote

1 answer

Most simplified form of the following regex / Extracting all values from nvidia-smi output

I am trying to analyze very large text string in Python containing nvidia-smi outputs but I really want to spend more time analyzing the data than working on my regex skills. I got the regex as follows but it takes forever in some rows (it might be…

python regex data-analysis nvidia nvidia-smi

asked Sep 04 '21 at 09:46

elegantcomplexity

13
7

0

votes

1 answer

StableLM answers too slow on GCP VM with GPU

I installed StableLM on a GCP VM with these specs: 1 x NVIDIA Tesla P4, 8 vCPU - 30 GB memory. And I set the model params llm_int8_enable_fp32_cpu_offload=True. But it takes too long to answer questions, ~8 minutes. It was faster even when using…

google-cloud-platform gpu huggingface-transformers llm nvidia-smi

asked Aug 22 '23 at 13:20

srls01

425
2
4
12

0

votes

0 answers

NVIDIA SMI shows lower CUDA version than NVCC

On an installation, when I run nvidia-smi, it shows the CUDA version as being 12.0. After installing the CUDA Toolkit, nvcc --version reports the version is 12.2. Is this a problem? Based on this very comprehensive answer, I understood that NVIDIA…

cuda nvidia nvcc nvidia-smi

asked Aug 21 '23 at 10:51

ahron

803
6
29

Questions tagged [nvidia-smi]