Questions tagged [nvvp]

NVVP (NVIDIA Visual Profiler) is the name of NVIDIA's proprietary GUI-enabled GPU CUDA profiling tool.

The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA GPUs shipped since 2006 on Linux, Mac OS X, and Windows. The NVIDIA Visual Profiler is available as part of the CUDA Toolkit. (source, official website)

The PGI Profiler (PGPROF) is strongly based on NVVP.

NVIDIA Visual Profiler offers both GUI and command line options (pgprof or nvprof), some basic informations can be found here: https://www.pgroup.com/resources/pgprof-quickstart.htm

More detailed information:
http://docs.nvidia.com/cuda/profiler-users-guide/index.html

44 questions
8
votes
2 answers

How to observe CUDA events and metrics for a subsection of an executable (e.g. only during a kernel execution time)?

I'm familiar with using nvprof to access the events and metrics of a benchmark, e.g., nvprof --system-profiling on --print-gpu-trace -o (file name) --events inst_issued1 ./benchmarkname The system-profiling on --print-gpu-trace -o (filename) …
travelingbones
  • 7,919
  • 6
  • 36
  • 43
6
votes
1 answer

Export CUDA nvprof output to the Visual Profiler

I would like to extract the data from my GPU application in order to check its limits. I have to use nvprof because the application runs on a remote server, so I should create a file to import locally in the Visual Profiler. I've tried to create the…
Stefano Sandonà
  • 619
  • 3
  • 9
  • 18
4
votes
2 answers

Profiling arbitrary CUDA applications

I know of the existence of nvvp and nvprof, of course, but for various reasons nvprof does not want to work with my app that involves lots of shared libraries. nvidia-smi can hook into the driver to find out what's running, but I cannot find a nice…
Ken Y-N
  • 14,644
  • 21
  • 71
  • 114
4
votes
2 answers

Excessive profiler overhead with NVidia Visual Profiler

I am getting a lot of profiling overhead when trying to profile my code using nvvp (or with nvprof): Overall time is 98 ms and I'm getting 85 ms of "Instrumentation" in the first kernel launch. How can I reduce this profiling overhead or otherwise…
KQS
  • 1,547
  • 10
  • 21
4
votes
1 answer

Cuda profiler shows strange gaps?

I am trying to figure out what a profile result means, before I start to optimize. I am very new with CUDA and profiling in general and I am confused by the result. Specifically, I want to know what is happening during seemingly unoccupied chunks of…
Mikhail
  • 7,749
  • 11
  • 62
  • 136
3
votes
1 answer

How to measure bank conflicts per warp using NVIDIA Visual Profiler?

I am doing a detailed code analysis for which I want to measure the total number of bank conflicts per warp. The nvvp documentation lists this metric, which was the only one I could find related to bank conflicts: shared_replay_overhead: Average…
Kajal
  • 581
  • 11
  • 24
2
votes
1 answer

Where is the boundary of start and end of CPU launch and GPU launch of Nvidia Profiling NVPROF?

What is the definition of start and end of kernel launch in the CPU and GPU (yellow block)? Where is the boundary between them? Please notice that the start, end, and duration of those yellow blocks in CPU and GPU are different.Why CPU invocation…
skytree
  • 1,060
  • 2
  • 13
  • 38
2
votes
1 answer

What's the difference between DtoD and PtoP memory copies?

While profiling application with nvprof I found both PtoP and DtoD memcpy. I am not sure about the difference between these two.
Saiful
  • 61
  • 5
2
votes
0 answers

nvprof shows error with TensorFlow

I am trying to run nvprof with cifar10_multigpu_train.py. I am using following command /home/ibm/tensorflow/third_party/gpus/cuda/bin/nvprof python cifar10_multi_gpu_train.py It starts the application but after sometime it shows following errors…
Khayam Gondal
  • 2,366
  • 2
  • 28
  • 40
2
votes
1 answer

Profile debug or release cuda code?

I have been profiling an application with nvprof and nvvp (5.5) in order to optimize it. However, I get totally different results for some metrics/events like inst_replay_overhead, ipc or branch_efficiency, etc. when I'm profiling the debug (-G) and…
ScHuMi
  • 23
  • 3
2
votes
1 answer

Can NVIDIA Visual Profiler display concurrent kernel execution?

I have read on many forums that NVIDIA Visual Profiler serializes the program in order to collect timing information. However in the visual profiler, under context tab, offers advice such as "There is no time overlap between memory copies and…
shadow
  • 141
  • 1
  • 7
1
vote
1 answer

Profilers (nvvp and nvprof) not showing "Page Fault" information

I am profiling a test code presented in the Unified Memory for CUDA Beginners on NVIDIA's developer forum. Code: #include #include // CUDA kernel to add elements of two arrays __global__ void add(int n, float* x, float* y) { …
skm
  • 5,015
  • 8
  • 43
  • 104
1
vote
1 answer

How to specify nvprof "devices" option for Nvidia Visual Profiler?

CUDA Toolkit 9.0, Windows 10, GTX 1060 & NVS 315, 385.54 Driver version. Nvidia Visual Profiler always fails to profile, returning the following two warning messages: "Warning: This version of nvprof doesn't support the underlying device, GPU…
Tyson Hilmer
  • 741
  • 7
  • 25
1
vote
1 answer

Dependency Analysis options in CUDA Profiler

I have implemented a program that uses a single GPU using the cudaStreamWaitEvent() function to set dependency within two streams using events. In order to verify this dependency, is it possible to use the "Dependency Analysis" view on the Nvidia…
BAdhi
  • 420
  • 7
  • 19
1
vote
1 answer

Why is there no activity on GPU between successive thrust sort and reduce commands?

Please refer to the two snapshots below showing a Nvidia Visual Profiler session of my CUDA code: Snapshot from nvprof session showing thrust::sort and thrust::reduce call execution timeline Highlighted the sort and reduce calls to show the times…
Aman Yadav
  • 21
  • 2
1
2 3