Are GPU Kepler CC3.0 processors not only pipelined architecture, but also superscalar?

Question

In the documentation for CUDA 6.5 has written: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz3PIXMTktb

5.2.3. Multiprocessor Level

...

8L for devices of compute capability 3.x since a multiprocessor issues a pair of instructions per warp over one clock cycle for four warps at a time, as mentioned in Compute Capability 3.x.

Does this mean that the GPU Kepler CC3.0 processors are not only pipelined architecture, but also superscalar?

Pipelining - these two sequences execute in parallel (different operations at one time):
- LOAD [addr1] -> ADD -> STORE [addr1] -> NOP
- NOP -> LOAD [addr2] -> ADD -> STORE [addr2]
Superscalar - these two sequences execute in parallel (the same operations at one time):
- LOAD [reg1] -> ADD -> STORE [reg1]
- LOAD [reg2] -> ADD -> STORE [reg2]

score 10 · Accepted Answer · answered Jan 19 '15 at 20:07

Yes, the warp schedulers in Kepler can schedule two instructions per clock, as long as:

the instructions are independent
the instructions come from the same warp
there are sufficient execution resources in the SM for both instructions

If that fits your definition of superscalar, then it is superscalar.

With respect to pipelining, I view pipelining differently. Various execution units in Kepler SM are pipelined. Let's take a floating point multiply as an example.

In a given clock, a Kepler warp scheduler may schedule a floating point multiply operation on a floating-point unit. The results of this operation may not appear for some number of clocks later, (i.e. they are not available on the next clock cycle) but on the next clock cycle, a new floating point operation can be scheduled on the very same floating point functional units, because the hardware (floating point units, in this case) is pipelined.

clock    operation    pipeline stage   result
0           MPY1   ->   PS1
1                       PS2
...                     ...
N-1                     PSN         ->  result1

on the very next clock after clock 0, a new multiply instruction can be scheduled on the same HW, and the corresponding result will appear on the next cycle after result1 appears.

Not sure if this is what you meant by "different operations at one time"

Thank you! Yes, I mean the same pipelining, but your description is more detailed. I.e. Kepler can execute not only in different stages of the same unit (FPU, IU, SFU, ...) - pipelining, but and in the same stage number of different units (in parallel: 1-st stage on FPU and instantaneously 1-st stage on SFU), is it true? — Alex, Jan 19 '15 at 20:39
Yes, kepler can have instructions in flight that are in different stages of the same pipeline/unit, as well as on different pipelines/units. Two instructions can be launched to different pipelines/units in the same clock. Two instructions can be launched into the same pipeline/unit on subsequent clocks. — Robert Crovella, Jan 19 '15 at 20:41

Are GPU Kepler CC3.0 processors not only pipelined architecture, but also superscalar?

1 Answers1

Linked