Cuda optimization techniques

Question

I have written a CUDA code to solve an NP-Complete problem, but the performance was not as I suspected.

I know about "some" optimization techniques (using shared memroy, textures, zerocopy...)

What are the most important optimization techniques CUDA programmers should know about?

score 7 · Accepted Answer · answered Jun 22 '10 at 07:04

You should read NVIDIA's CUDA Programming Best Practices guide: http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide.pdf

This has multiple different performance tips with associated "priorities". Here are some of the top priority tips:

Use the effective bandwidth of your device to work out what the upper bound on performance ought to be for your kernel
Minimize memory transfers between host and device - even if that means doing calculations on the device which are not efficient there
Coalesce all memory accesses
Prefer shared memory access to global memory access
Avoid code execution branching within a single warp as this serializes the threads

6. Avoid bank conflicts. PS In my application, i have found out, that usage of statically allocated shared memory is faster, than usage of dynamically allocated memory (with kernels<<>>()) All this is described in best practices guide. — LonliLokli, Jun 22 '10 at 09:38

score 2 · Answer 2 · answered Dec 06 '11 at 01:25

The new NVIDIA Visual Profiler (v4.1) supports automated performance analysis to identify performance improvement opportunities in your application. It also links directly to the most useful sections of the Best Practices Guide for the issues it detects. And the Visual Profiler is available for free as part of the CUDA Toolkit on NVIDIA's developer web site: http://www.nvidia.com/getcuda.

Cuda optimization techniques

2 Answers2

Linked