2

I wanted to get a feel for Kepler's architecture, but it doesn't make sense to me.

If a warp is 32 threads, and 4 of them get scheduled/executed, that would mean 128 cores are in use and 64 are left idle. In the whitepaper it said something about independent instructions, so are the 64 cores reserved for those instructions?

If so, can someone give me an example of when an independent instruction would be needed?

rene
  • 41,474
  • 78
  • 114
  • 152

1 Answers1

3

Each SM in Kepler has 192 (SP) cores, and 4 warp schedulers. Each warp scheduler is capable of dual-issue which means that it can actually issue 2 instructions from a given threadblock (actually for a particular warp) in a single issue slot, under some circumstances.

One of these circumstances is that the instructions should be independent, which roughly speaking means that niether instruction depends on the output of the other instruction.

With 4 warp schedulers, each capable of possibly dual-issue, it's theoretically possible to launch work for up to 8 warp instructions. This is at least theoretically enough to keep 192 (SP) cores busy.

An SM has execution units besides the SP units that are commonly referred to as "cores", so the actual instruction mix will determine which execution units are scheduled in any given issue slot.

You can get a more detailed description in the GK110 whitepaper.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Thank you for the reply, but lets say that 8 (SP) warps were launched, 6 could execute together, and the other 2 would have to wait for the next cycle? –  Sep 28 '14 at 05:36
  • Yes, the schedulers would not pick 8 SP warp instructions to launch in a single cycle. – Robert Crovella Sep 28 '14 at 14:11
  • 1
    @Peezy - To clarify on kepler and maxwell each warp scheduler (4 per SM) will pick 1 warp that it manages and issue 1 or 2 instructions from the selected warp. A warp scheduler cannot in the same cycle issue instructions from 2 different warps. This is why Robert keeps using the term "warp instructions". – Greg Smith Oct 02 '14 at 03:23
  • @GregSmith So basically each scheduler is in charge of a warp, but how do the schedulers decide whether to launch 2 instructions, and how does, for example, one scheduler know when another is going to launch 2 instructions? Can they communicate with each other? –  Oct 03 '14 at 21:44
  • 1
    A warp is allocated to a warp scheduler at launch. Each warp scheduler is responsible for a set of warps (16 for Kepler/Maxwell, 24 for Fermi). On each cycle the warp scheduler will pick an eligible warp (not stalled) and issue 1 or 2 instructions. Dual-issued depends on the instruction mix and piple availability. Each warp scheduler has its own math pipelines. There is an undocumented arbitration scheme for shared pipelines such as LSU. – Greg Smith Oct 07 '14 at 14:59
  • a warp scheduler can only select one warp at a time and at most issue 2 instructions to the **same** warp. In this sense how could it be enough to keep 192 cores (6 warps) busy with only 4 schedulers? To me it seems that there will always be 2 warps idle at any time. – hzh May 19 '18 at 03:56
  • On kepler the warp scheduler can issue 1 fp32 per cycle and 1 fp32 instruction every 1/2 clock. 32 + 16 threads per cycle * 4 warp schedulers = 192 cores. In order to achieve this each warp scheduler only requires 1 warp with paired fp32 every 4 cycles and no dependency between instructions. – Greg Smith May 22 '18 at 19:05