Instruction execution in GPGPU

Asked Feb 11 '23 at 03:19

Active Feb 11 '23 at 03:19

Viewed 26 times

I am learning GPU hardware (AMD GCN architecture). I am confused a little bit about the instruction executions. Let me take an example:

for(i=0;i<64;i++) c[i] = a[i] + b[i]

for the above code. Assuming the warp/wavefront has 64 threads. Now a single warp/wavefront contains these instructions:

PC+0 has vector load of a (a simd instruction)
PC+4 has vector load of b (a simd instruction)
PC+8 has vector add (a simd instruction)
PC+12 has vector store of c into memory (a simd instruction)

AMD GCN has 4 simd's. Now only one warp/wavefront is taken by one SIMD alone. Assume SIMD0 is executing above warp. Each SIMD have 16 vector alu's.

As per the whitepaper, they have mentioned that it takes 4 cycles for 64 threads to execute. My question is how does the single SIMD instruction is pipelined. I don't have clarity on these concepts. Please correct me if I am wrong in explaining above example as well.

asked Feb 11 '23 at 03:19

MGS

Instruction execution in GPGPU

0 Answers0