CUDA Warp Divergence

Question

I' m developing with cuda and have an arithmetic problem, which I could implement with or without warp diverengence. With warp divergence it would look like:

float v1;
float v2;
//calculate values of v1 and v2
if(v2 != 0)
  v1 += v2*complicated_math();
//store v1

Without warp divergence the version looks like:

float v1;
float v2;
//calculate values of v1 and v2
v1 += v2*complicated_math();
//store v1

The Question is, which version is faster?

In other words how expensive is a warp disable compared to some extra calculation and addition of 0?

Have you checked the similar questions? http://stackoverflow.com/q/16739248/156811 — Havenard, Aug 19 '15 at 17:01
Thanks, for marking the other thread. But I checked it before. The questions have a different core. So I decide to ask this question. — Melenor, Aug 19 '15 at 17:25
It is impossible to say without some more context. The compiler is much smarter than you (and me) and without a complete example which could be compiled, disassembled and run (which is precisely what you should be doing), there is no way to answer this question. — talonmies, Aug 19 '15 at 17:52

score 1 · Answer 1 · answered Aug 19 '15 at 17:04

1

Your question has no single answer. This heavily depends on the amount of extra calculations, divergence frequency, type of hardware, dimensions and many more aspects. The best way is simply to program both and use profiling to determine the best solution in this particular case and situation.

answered Aug 19 '15 at 17:04

abort

191
9

Well, I feared something like this, but shouldn't be a general statement about the thread disabling speed? – Melenor Aug 19 '15 at 17:33

CUDA Warp Divergence

1 Answers1