In the documentation for CUDA 6.5 has written: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz3PIXMTktb
5.2.3. Multiprocessor Level
...
- 8L for devices of compute capability 3.x since a multiprocessor issues a pair of instructions per warp over one clock cycle for four warps at a time, as mentioned in Compute Capability 3.x.
Does this mean that the GPU Kepler CC3.0 processors are not only pipelined architecture, but also superscalar?
Pipelining - these two sequences execute in parallel (different operations at one time):
- LOAD [addr1] -> ADD -> STORE [addr1] -> NOP
- NOP -> LOAD [addr2] -> ADD -> STORE [addr2]
Superscalar - these two sequences execute in parallel (the same operations at one time):
- LOAD [reg1] -> ADD -> STORE [reg1]
- LOAD [reg2] -> ADD -> STORE [reg2]