Highest Voted 'nsight-compute' Questions

6

votes

1 answer

What are the "long" and "short" scoreboards w.r.t. MIO/L1TEX?

With recent NVIDIA micro-architectures, there's a new (?) taxonomy of warp stall reasons / warp scheduler states. Two of the items in this taxonomy are: Short scoreboard - scoreboard dependency on an MIO queue operation. Long scoreboard -…

asked Feb 09 '21 at 17:14

einpoklum

118,144
57
340
684

5

votes

1 answer

When does MIO Throttle stall happen?

According to this link https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html: Warp was stalled waiting for the MIO (memory input/output) instruction queue to be not full. This stall reason is high in cases of extreme utilization of the…

cuda gpu nvidia nsight-compute

asked Feb 16 '21 at 23:07

rm95

167
6

5

votes

1 answer

How does Nsight Compute determine/display shared memory metrics?

I'm learning about __shared__ memory in CUDA, and I'm confused about how Nsight Compute shows shared memory statistics. I'm going through this article (code available on Nvidia's github here, but copied below for reference). #include…

cuda gpu-shared-memory nsight-compute

asked May 27 '20 at 11:55

Daniel A. Thompson

1,904
1
17
26

4

votes

0 answers

L2 Fabric cache hit rate of CUDA kernels on A100

I am profiling a read-only kernel in Nsight Compute on A100. The kernel is very simple and the complete code is as below. #include #include #include #include const int BLOCK = 1024; const int…

caching cuda nsight-compute

asked Mar 27 '23 at 12:55

Shulai

41
2

4

votes

1 answer

Nsight Compute says: "Profiling is not supported on this device" - why?

I have a machine with an NVIDA GTX 1050 Ti GPU (compute capability 6.1), and am trying to profile a kernel in a program I built with CUDA 11.4. My OS distribution is Devuan GNU/Linux 4 Chimaera (~= Debian 11 Bullseye). NSight Compute starts my…

cuda profiling nvidia gpgpu nsight-compute

asked Oct 16 '21 at 21:09

einpoklum

118,144
57
340
684

4

votes

1 answer

Interpreting compute workload analysis in Nsight Compute

Compute Workload Analysis displays the utilization of different compute pipelines. I know that in a modern GPU, integer and floating point pipelines are different hardware units and can execute in parallel. However, it is not very clear which…

cuda nsight-compute

asked Apr 24 '20 at 16:38

heapoverflow

264
2
12

4

votes

2 answers

CUDA : How to detect shared memory bank conflict on device with compute capabiliy >= 7.2?

On device with compute capability <= 7.2 , I always use nvprof --events shared_st_bank_conflict but when i run it on RTX2080ti with CUDA10 , it returns Warning: Skipping profiling on device 0 since profiling is not supported on devices with…

cuda profiling gpu-shared-memory nsight-compute

asked Mar 22 '19 at 10:10

LiMou

41
3

2

votes

1 answer

Unbalanced Memory Read & Write in CUDA

I noticed an unbalanced memory read and write amount when profiling the underneath cuda kernel using ncu. __global__ void kernel(void* mem, int n) { int* ptr = reinterpret_cast(mem); for (int offset = (threadIdx.x + blockIdx.x *…

caching cuda nsight-compute

asked Jun 29 '23 at 03:42

Alex Chen

33
2

2

votes

0 answers

With the NSight Compute profiler, can I check cache hit rates for a specific region of memory?

My GPU kernel reads data from different input buffers. I want to check whether I manage to get cache hits for the reads from one of these buffers. Is it possible to limit the counting of cache hit/miss metrics to a particular range of memory…

caching cuda profiling nsight-compute

asked Mar 01 '23 at 12:58

einpoklum

118,144
57
340
684

2

votes

1 answer

How to profile in CUDA application with compute capability 7.x? Is metric "dram_read_throughput" valid in Nsight Compute?

My setup environment: CUDA 10.2 Device: RTX 2080 OS: Ubuntu 16.04 When I try to use nvprof, I find that it doesn't support devices with compute capability 7.2 and higher. It is recommended that I should use Nsight Compute or Nsight Systems…

cuda nsight-compute

asked Jul 30 '20 at 11:25

fishmingee

21
1
2

1

vote

1 answer

CUDA math function register usage

I am trying to understand the significant register usage incurred when using a few of the built-in CUDA math ops like atan2() or division and how the register usage might be reduced/eliminated. I'm using the following program: #include…

cuda gpu nsight nsight-compute

asked Jul 14 '23 at 00:21

Chris Uchytil

140
1
11

1

vote

1 answer

Roofline Model with CUDA Manual vs. Nsight Compute

I have a very simple vector addition kernel written for CUDA. I want to calculate the arithmetic intensity as well as GFLOP/s for this Kernel. The values I calculate differ visibly from the values obtained by Nsight Compute's Roofline Analysis…

cuda nsight nvprof nsight-compute roofline

asked Jul 12 '23 at 19:10

Cherry Toska

131
8

1

vote

2 answers

ncu-ui won't run: Could not load the Qt platform plugin "xcb" in "" even though it was found

I'm trying to run the ncu-ui profiler GUI on a CentOS 7 Linux system (using ncu-ui 2022.1), both as root and as a regular user. I'm getting the error: qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found. This…

cuda profiling xcb qtplugin nsight-compute

asked Apr 28 '22 at 11:25

einpoklum

118,144
57
340
684

1

vote

1 answer

Shared memory loads not registered when using Tensor Cores

I'm trying to multiply blocks of size 8x8 using Tensor Cores on a GPU designed with the Turing architecture. For that I'm using the WMMA API and fragments of size 16x16. My assumption was that shared memory bandwidth would be wasted since most data…

c++ cuda gpgpu gpu-shared-memory nsight-compute

asked Feb 17 '21 at 10:03

rm95

167
6

1

vote

2 answers

What does NSight Compute show for a stall reason that isn't "supported"?

The CUDA Profiling Guide lists various reasons for sampled warp stalls, e.g. Allocation, Barrier, LG Throttle etc. And - the NSight Compute profiler shows the distribution of these as part of the profiling results. The thing is, some of the stall…

cuda profiling nsight-compute

asked Feb 02 '21 at 14:31

einpoklum

118,144
57
340
684

Questions tagged [nsight-compute]