Highest Voted 'nvprof' Questions

9

votes

1 answer

How to profile OpenCL application with CUDA 8.0 nvprof

I'm trying to profile OpenCL application, a.out, in a system with NVIDIA TITAN X and CUDA 8.0. If it was CUDA application, nvprof ./a.out would be enough. But I found this does not work with OpenCL application, with a message "No kernels were…

asked Jan 17 '17 at 13:56

csehydrogen

374
5
17

8

votes

2 answers

How to observe CUDA events and metrics for a subsection of an executable (e.g. only during a kernel execution time)?

I'm familiar with using nvprof to access the events and metrics of a benchmark, e.g., nvprof --system-profiling on --print-gpu-trace -o (file name) --events inst_issued1 ./benchmarkname The system-profiling on --print-gpu-trace -o (filename) …

cuda profiling nvvp nvprof

asked Sep 17 '15 at 17:16

travelingbones

7,919
6
36
43

7

votes

1 answer

Nvidia's nvprof outputs for FLOPS

I see that nvprof can profile the number of flop in the kernel (using the parameters as below). Also when I browse through the documentation (here http://docs.nvidia.com/cuda... it says flop_count_sp is "Number of single-precision floating-point…

cuda nvprof

asked Jun 06 '17 at 11:17

Amit

85
1
9

6

votes

1 answer

What is the difference between 'GPU activities' and 'API calls' in the results of 'nvprof'?

What is the difference between 'GPU activities' and 'API calls' in the results of 'nvprof'? I don't know why there's a time difference in the same function. For example, [CUDA memcpy DtoH] and cuMemcpyDtoH. So I don't know what the right time is. I…

c++ cuda nvprof

asked Apr 08 '19 at 13:00

myabcc17

83
5

6

votes

1 answer

Export CUDA nvprof output to the Visual Profiler

I would like to extract the data from my GPU application in order to check its limits. I have to use nvprof because the application runs on a remote server, so I should create a file to import locally in the Visual Profiler. I've tried to create the…

cuda nvvp nvprof

asked Jan 21 '16 at 21:15

Stefano Sandonà

619
3
9
18

6

votes

1 answer

Understanding CUDA profiler output (nvprof)

I'm just looking at the following output and trying to wrap my mind around the numbers: ==2906== Profiling result: Time(%) Time Calls Avg Min Max Name 23.04% 10.9573s 16436 666.67us 64.996us 1.5927ms …

cuda memcpy nvprof

asked May 21 '15 at 10:24

Pavel

7,436
2
29
42

5

votes

2 answers

What is redzone_checker? Profiling my tensorflow application on a GPU

I am profiling a tensorflow GPU application with NVIDIA's command line Visual Profiler nvprof, and one of the kernels that was launched and is very active in the profiling is something called redzone_checker? I cannot for the life of me find any…

tensorflow gpu profiling nvprof

asked Mar 24 '20 at 05:32

jakemdaly

67
4

5

votes

1 answer

How can I access the numeric stream IDs seen in nvprof, using a cudaStream_t?

In nvprof I can see the stream IDs for each cuda execution stream I am using (0, 13, 15, etc.) Given a stream variable, I'd like to be able to print out the stream ID. Currently I cannot find any API to do this and casting the cudaStream_t to an int…

cuda nvprof

asked May 30 '17 at 15:54

CPayne

516
2
5
20

5

votes

1 answer

How are the blocks scheduled into the SMs in CUDA when their number is lesser than the available SMs?

This question arises from the differences between the theorical and achieved occupancy observed in a kernel. I'm aware of that different occupancy between calculator and nvprof and also of A question about the details about the distribution from…

cuda profiling gpu nvidia nvprof

asked Apr 26 '17 at 11:09

pQB

3,077
3
23
49

5

votes

1 answer

nvprof option for bandwidth

What is the correct option for measuring bandwidth using nvprof --metrics from the command line? I am using flop_dp_efficiency to get the percentage of peak FLOPS, but there seems to be many options for bandwidth measurement in the manual that I…

cuda profiling nvprof

asked Jun 09 '16 at 17:36

danny

1,101
1
12
34

4

votes

1 answer

nvprof command error: cupti64_102.dll was not found

When I try to run nvprof command in Command Prompt, System Erros pops up and says "The code execution cannot proceed because cupti64_102.dll was not found. Reinstall the program may fix this problem." I have installed the CUDA Toolkit 10.2 but…

cuda nvidia nvcc nvprof

asked May 18 '20 at 08:16

john mori

41
1
2

4

votes

0 answers

Tensorflow - Profile Custom Op

I am interested in a way to measure the detailed performance of a custom Tensorflow Op when run on a GPU. So far I have tried the approach of this post using a Timeline, as well as the internal Tensorflow Profiler (tf.profiler.Profiler). Both…

python tensorflow profiling nvprof

asked Dec 04 '18 at 07:53

Christoph Pohl

325
5
19

4

votes

2 answers

Profiling arbitrary CUDA applications

I know of the existence of nvvp and nvprof, of course, but for various reasons nvprof does not want to work with my app that involves lots of shared libraries. nvidia-smi can hook into the driver to find out what's running, but I cannot find a nice…

linux cuda nvprof nvvp

asked May 18 '18 at 03:39

Ken Y-N

14,644
21
71
114

4

votes

2 answers

Is it possible to see that kernel execution happened on Tensor Cores or not via nvprof (or some other method)?

I'm trying to identify bottlenecks in GPU execution performance for deep learning models on Titan V / V100. I understand that certain requirements must be met for the underlying kernel execution to be performed on Tensor Cores based on…

cuda gpu nvidia cudnn nvprof

asked Dec 20 '17 at 20:31

n00b

167
1
2
8

3

votes

0 answers

cuda kernel 'volta_sgemm_128x32_nn' means what?

I am studying the nvidia torch matmul function. ### variable creation a = torch.randn(size=(1,128,3),dtype=torch.float32).to(cuda) b = torch.randn(size=(1,3,32),dtype=torch.float32).to(cuda) ### execution c = torch.matmul(a,b) I profiled this code…

multithreading grid block torch nvprof

asked Feb 23 '21 at 06:16

김양곤

41
3

Questions tagged [nvprof]