16

When reading through CUDA 5.0 Programming Guide I stumbled on a feature called "Funnel shift" which is present in 3.5 compute-capable device, but not 3.0. It contains an annotation "see reference manual", but when I search for the "funnel shift" term in the manual, I don't find anything.

I tried googling for it, but only found a mention on http://www.cudahandbook.com, in the chapter 8:

8.2.3 Funnel Shift (SM 3.5)

GK110 added a 64-bit “funnel shift” instruction that may be accessed with the following intrinsics:

__funnelshift_lc(): returns most significant 32 bits of a left funnel shift.

__funnelshift_rc(): returns least significant 32 bits of a right funnel shift.

These intrinsics are implemented as inline device functions (using inline PTX assembler) in sm_35_intrinsics.h.

...but it still does not explain what the "left funnel shift" or "right funnel shift" is.

So, what is it and where does one need it?

einpoklum
  • 118,144
  • 57
  • 340
  • 684
CygnusX1
  • 20,968
  • 5
  • 65
  • 109
  • 1
    Funnel shifting is where two input words are concatenated and then shifted, and a word size output extracted from the result of the concatenate/shift. – talonmies Oct 07 '12 at 08:11
  • Is it something different than __shfl_up(value, index) ? – lashgar Oct 07 '12 at 12:07
  • 1
    As talonmies says, a funnel shifter extracts any contiguous n-bit group of bits from the concatenation of two n-bit words. Note that a funnel shifter provides for efficient implementation of rotates, by making both inputs the same n-bit word. Use of the term "funnel" alludes to the fact that the input is wider than the output. – njuffa Oct 07 '12 at 16:17
  • Ah, this is good feedback on the CUDA handbook. I need to add a bit of clarifying language there, it seems :-) – ArchaeaSoftware Oct 07 '12 at 22:06
  • @ahmad, yes, it is different than __shfl_up(). The shuffle instructions enable data interchange between threads within a warp. – ArchaeaSoftware Oct 07 '12 at 22:33
  • talonmies, njuffa, Archaea, want to write an answer? – harrism Oct 08 '12 at 01:44
  • Thank you for the comments. I think I understand what it does, but some nice answer (maybe with a nice example for clarity) would be great! For me and perhaps for others who might stumble on it. Also, nice to meet the handbook's author here. Didn't expect that :) – CygnusX1 Oct 08 '12 at 21:06

1 Answers1

9

In the case of CUDA, two 32-bit registers are concatenated together into a 64-bit value; that value is shifted left or right; and the most significant (for a left shift) or least significant (for right shift) 32 bits are returned.

The intrinsics from sm_35_intrinsics.h are as follows:

unsigned int __funnelshift_lc(unsigned int lo, unsigned int hi, unsigned int shift);
unsigned int __funnelshift_rc(unsigned int lo, unsigned int hi, unsigned int shift);

According to Andy Glew (dead link removed), applications for funnel shift include fast misaligned memcpy; and as njuffa mentions in the comments above, it can be used to implement rotate if the two input words are the same.

ArchaeaSoftware
  • 4,332
  • 16
  • 21