Questions tagged [nvprof]

nvprof is a command-line profiler that enables you to collect and view CPU and GPU timers and events in CUDA programs.

89 questions
9
votes
1 answer

How to profile OpenCL application with CUDA 8.0 nvprof

I'm trying to profile OpenCL application, a.out, in a system with NVIDIA TITAN X and CUDA 8.0. If it was CUDA application, nvprof ./a.out would be enough. But I found this does not work with OpenCL application, with a message "No kernels were…
csehydrogen
  • 374
  • 5
  • 17
8
votes
2 answers

How to observe CUDA events and metrics for a subsection of an executable (e.g. only during a kernel execution time)?

I'm familiar with using nvprof to access the events and metrics of a benchmark, e.g., nvprof --system-profiling on --print-gpu-trace -o (file name) --events inst_issued1 ./benchmarkname The system-profiling on --print-gpu-trace -o (filename) …
travelingbones
  • 7,919
  • 6
  • 36
  • 43
7
votes
1 answer

Nvidia's nvprof outputs for FLOPS

I see that nvprof can profile the number of flop in the kernel (using the parameters as below). Also when I browse through the documentation (here http://docs.nvidia.com/cuda... it says flop_count_sp is "Number of single-precision floating-point…
Amit
  • 85
  • 1
  • 9
6
votes
1 answer

What is the difference between 'GPU activities' and 'API calls' in the results of 'nvprof'?

What is the difference between 'GPU activities' and 'API calls' in the results of 'nvprof'? I don't know why there's a time difference in the same function. For example, [CUDA memcpy DtoH] and cuMemcpyDtoH. So I don't know what the right time is. I…
myabcc17
  • 83
  • 5
6
votes
1 answer

Export CUDA nvprof output to the Visual Profiler

I would like to extract the data from my GPU application in order to check its limits. I have to use nvprof because the application runs on a remote server, so I should create a file to import locally in the Visual Profiler. I've tried to create the…
Stefano Sandonà
  • 619
  • 3
  • 9
  • 18
6
votes
1 answer

Understanding CUDA profiler output (nvprof)

I'm just looking at the following output and trying to wrap my mind around the numbers: ==2906== Profiling result: Time(%) Time Calls Avg Min Max Name 23.04% 10.9573s 16436 666.67us 64.996us 1.5927ms …
Pavel
  • 7,436
  • 2
  • 29
  • 42
5
votes
2 answers

What is redzone_checker? Profiling my tensorflow application on a GPU

I am profiling a tensorflow GPU application with NVIDIA's command line Visual Profiler nvprof, and one of the kernels that was launched and is very active in the profiling is something called redzone_checker? I cannot for the life of me find any…
jakemdaly
  • 67
  • 4
5
votes
1 answer

How can I access the numeric stream IDs seen in nvprof, using a cudaStream_t?

In nvprof I can see the stream IDs for each cuda execution stream I am using (0, 13, 15, etc.) Given a stream variable, I'd like to be able to print out the stream ID. Currently I cannot find any API to do this and casting the cudaStream_t to an int…
CPayne
  • 516
  • 2
  • 5
  • 20
5
votes
1 answer

How are the blocks scheduled into the SMs in CUDA when their number is lesser than the available SMs?

This question arises from the differences between the theorical and achieved occupancy observed in a kernel. I'm aware of that different occupancy between calculator and nvprof and also of A question about the details about the distribution from…
pQB
  • 3,077
  • 3
  • 23
  • 49
5
votes
1 answer

nvprof option for bandwidth

What is the correct option for measuring bandwidth using nvprof --metrics from the command line? I am using flop_dp_efficiency to get the percentage of peak FLOPS, but there seems to be many options for bandwidth measurement in the manual that I…
danny
  • 1,101
  • 1
  • 12
  • 34
4
votes
1 answer

nvprof command error: cupti64_102.dll was not found

When I try to run nvprof command in Command Prompt, System Erros pops up and says "The code execution cannot proceed because cupti64_102.dll was not found. Reinstall the program may fix this problem." I have installed the CUDA Toolkit 10.2 but…
john mori
  • 41
  • 1
  • 2
4
votes
0 answers

Tensorflow - Profile Custom Op

I am interested in a way to measure the detailed performance of a custom Tensorflow Op when run on a GPU. So far I have tried the approach of this post using a Timeline, as well as the internal Tensorflow Profiler (tf.profiler.Profiler). Both…
Christoph Pohl
  • 325
  • 5
  • 19
4
votes
2 answers

Profiling arbitrary CUDA applications

I know of the existence of nvvp and nvprof, of course, but for various reasons nvprof does not want to work with my app that involves lots of shared libraries. nvidia-smi can hook into the driver to find out what's running, but I cannot find a nice…
Ken Y-N
  • 14,644
  • 21
  • 71
  • 114
4
votes
2 answers

Is it possible to see that kernel execution happened on Tensor Cores or not via nvprof (or some other method)?

I'm trying to identify bottlenecks in GPU execution performance for deep learning models on Titan V / V100. I understand that certain requirements must be met for the underlying kernel execution to be performed on Tensor Cores based on…
n00b
  • 167
  • 1
  • 2
  • 8
3
votes
0 answers

cuda kernel 'volta_sgemm_128x32_nn' means what?

I am studying the nvidia torch matmul function. ### variable creation a = torch.randn(size=(1,128,3),dtype=torch.float32).to(cuda) b = torch.randn(size=(1,3,32),dtype=torch.float32).to(cuda) ### execution c = torch.matmul(a,b) I profiled this code…
김양곤
  • 41
  • 3
1
2 3 4 5 6