8

I'm familiar with using nvprof to access the events and metrics of a benchmark, e.g.,

nvprof --system-profiling on --print-gpu-trace -o (file name) --events inst_issued1 ./benchmarkname

The

system-profiling on --print-gpu-trace -o (filename)    

command gives timestamps for start time, kernel end times, power, temp and saves the info an nvvp files so we can view it in the visual profiler. This allows us to see what's happening in any section of a code, in particular when a specific kernel is running. My question is this--

Is there a way to isolate the events counted for only a section of the benchmark run, for example during a kernel execution? In the command above,

--events inst_issued1    

just gives the instructions tallied for the whole executable. Thanks!

travelingbones
  • 7,919
  • 6
  • 36
  • 43

2 Answers2

20

You may want to read the profiler documentation.

You can turn profiling on and off within an executable. The cuda runtime API for this is:

cudaProfilerStart() 
cudaProfilerStop() 

So, if you wanted to collect profile information only for a specific kernel, you could do:

#include <cuda_profiler_api.h>
...

cudaProfilerStart();
myKernel<<<...>>>(...);
cudaProfilerStop();

(instead of a kernel call, the above could be a function or code that calls kernels) Excerpting from the documentation:

When using the start and stop functions, you also need to instruct the profiling tool to disable profiling at the start of the application. For nvprof you do this with the --profile-from-start off flag. For the Visual Profiler you use the Start execution with profiling enabled checkbox in the Settings View.

Also from the documentation for nvprof specifically, you can limit event/metric tabulation to a single kernel with a command line switch:

 --kernels <kernel name>

The documentation gives additional usage possibilities.

The same methodologies are possible with nsight systems and nsight compute.

The CUDA profiler start/stop functions work exactly the same way. The nsight systems documentation explains how to run the profiler with capture control managed by the profiler api:

nsys [global-options] start -c cudaProfilerApi

or for nsight compute:

ncu [options] --profile-from-start off

Likewise, nsight compute can be conditioned via the command line to only profile specific kernels. The primary switch for this is -k to select kernels by name. In repetetive situations, the -c switch can be used to determine the number of the named kernel launches to profile, and the -s switch can be used to skip a number of launches before profiling.

These methodologies don't apply just to events and metrics, but to all profiling activity performed by the respective profilers.

The CUDA profiler API can be used in any executable, and does not require compilation with nvcc.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Thanks @Robert Crovella ! Also, I've been looking into this, and noticed you can get kernel level information for each kernel w/o calling --kernel I'll make another answer. – travelingbones Sep 21 '15 at 18:41
  • Can you point me to a list of metrics and events for Fermi architectures? Judging from every NVIDIA GPU Profiling question I've seen, you're the first to answer! http://stackoverflow.com/questions/32994604/what-are-the-available-nvidia-fermi-architecture-gpus-counters-events-and-metri – travelingbones Oct 07 '15 at 18:23
  • You can actually get this from the hardware itself by running `nvprof --query-metrics` (maybe you should read the [profiler manual](http://docs.nvidia.com/cuda/profiler-users-guide/index.html#event-summary-mode) ??) You can also do `nvprof --query-events` (Should I mention that the profiler has command line help `nvprof --help`?) Anyway, at the end of the profiler documentation, there is a list of metrics broken down by compute architecture. Fermi is cc2.0. So look [here](http://docs.nvidia.com/cuda/profiler-users-guide/index.html#metrics-reference). – Robert Crovella Oct 07 '15 at 18:32
  • I don't know of a corresponding published list for events (there may be one somewhere, I just don't know where it is), so I ran the command on a Fermi device and posted it [here](http://pastebin.com/FF02GSC1) for you. – Robert Crovella Oct 07 '15 at 18:45
  • Your the best! Thank you! – travelingbones Oct 07 '15 at 21:06
0

After looking into this a bit more, it turns out that kernel level information is also given for all kernels (w/o using --kernels and specifying them specifically) by using

nvprof --events <event names> --metrics <metric names> ./<cuda benchmark>   

In fact, it gives output of the form

"Device","Kernel","Invocations","Event Name","Min","Max","Avg"

If a kernel is called multiple times in the benchmark, this allows you to see the Min, Max, Avg of the desired events for those kerne runs. Evidently the --kernels option on Cuda 7.5 Profiler allows each run of each kernel to be specified.

travelingbones
  • 7,919
  • 6
  • 36
  • 43