6

I am fairly certain that a warp is only defined in CUDA. But maybe I'm wrong. What is a warp in terms of OpenCL?

It's not the same as work group, is it? Any relevant feedback is highly appreciated. Thanks!

talonmies
  • 70,661
  • 34
  • 192
  • 269
Luis B
  • 1,684
  • 3
  • 15
  • 25
  • Not used much OpenCL, but I've never heard that term. – nbro Sep 14 '16 at 22:38
  • Warp in nvidia is similar to wavefront in amd. For example, each compute unit of a new amd gpu has 64 cores but when you give it 256-wide worgroup, then it takes 4 wavefronts to complete that 256-thread workgroup. If a kernel uses very few memory, it can let an amd compute unit to issue 40 wavefronts. It takes 4 clocks to issue 4 wavefronts to compute unit. But it can take longer to finish. – huseyin tugrul buyukisik Sep 14 '16 at 23:00

1 Answers1

7

It isn't defined in the OpenCL standard. A warp is a thread as executed by the hardware (CUDA threads are not really threads and map onto a warp as separate SIMD elements with some clever hardware/software mapping). It is a collection of work-items and there can be multiple warps in a work-group.

An OpenCL subgroup was designed to be compatible with a hardware thread, and hence is able to represent a warp in the OpenCL kernel, but it is entirely up to NVIDIA to decide to implement subgroups or not and of course an OpenCL subgroup cannot expose every feature that NVIDIA can expose for warps because it is a standard, while NVIDIA can do anything they like on their own devices.

Lee
  • 910
  • 5
  • 6
  • Maybe I should read up on hardware threads? I'm only familiar with software threads in the context of an Operating System. Are the CUDA threads (context: multiple threads per block) referring to hardware threads, software threads, or perhaps its own idea of threads? – Luis B Sep 15 '16 at 02:03
  • 1
    So a warp is a hardware thread. A CUDA thread is not a hardware thread. A warp is a collection of work-items. And a work group can be defined either as (1) a collection of work-items, or (2) a collection of warps. Is this correct? If so, is it safe to say that there is a hierarchy of work-items: work-group is a collection of warps is a collection of work items. Is this correct? Or at least is this a good way to think about it? – Luis B Sep 15 '16 at 02:07
  • Or is it better to just think about it like: a work-group is a collection of work-items. And a warp just happens to also be a collection of work items. But warp and work-group have different uses. Is this a better way to think of it? (I hope I am conveying my current understanding/confusion well) – Luis B Sep 15 '16 at 02:09
  • I like mdma's answer to software threads vs hardware threads here: http://stackoverflow.com/questions/5593328/software-threads-vs-hardware-threads – Luis B Sep 15 '16 at 02:12
  • Just think of a warp as a collection of work items that happen to be there on NVIDIA's hardware. It varies from device to device so will be hard to make portable if you rely on the actual SIMD mapping of the work-items. – Lee Sep 15 '16 at 17:41
  • That answer on software/hardware threads is part of the way there. What I meant for OpenCL/CUDA is a little different because it is a different sort of software->hardware mapping from the fiber-like concepts described in that answer. The concept of a thread in CUDA is really a single SIMD lane - so if you map this to the CPU and SSE instructions the thread is a bit like just one element of the _m128 SSE register. Chapter 3 of Heterogeneous Computing with OpenCL describes this in the best way I could think of at the time. You should be able to find the chapter online somewhere. – Lee Sep 15 '16 at 17:45
  • That warp explanation works for me. Thanks for talking me through it. – Luis B Sep 16 '16 at 01:45
  • I'm not really familiar with what a SIMD is or SIMD lane is, and the intro paragraph on the wikipedia entry does not help (all I understand from that is that SIMD is parallel hardware). But I do have that book, so I'll take a look at Chapter 3. (Thanks for helping write the book and for helping answer this question). – Luis B Sep 16 '16 at 01:55