0

I am learning GPU hardware (AMD GCN architecture). I am confused a little bit about the instruction executions. Let me take an example:

for(i=0;i<64;i++) c[i] = a[i] + b[i] 

for the above code. Assuming the warp/wavefront has 64 threads. Now a single warp/wavefront contains these instructions:

  1. PC+0 has vector load of a (a simd instruction)
  2. PC+4 has vector load of b (a simd instruction)
  3. PC+8 has vector add (a simd instruction)
  4. PC+12 has vector store of c into memory (a simd instruction)

AMD GCN has 4 simd's. Now only one warp/wavefront is taken by one SIMD alone. Assume SIMD0 is executing above warp. Each SIMD have 16 vector alu's.

As per the whitepaper, they have mentioned that it takes 4 cycles for 64 threads to execute. My question is how does the single SIMD instruction is pipelined. I don't have clarity on these concepts. Please correct me if I am wrong in explaining above example as well.

MGS
  • 9
  • 1

0 Answers0