3

How is the #pragma omp simd directive translated for a GPU target device?

GPU's cores are handling a separate thread each. Threads are combined in groups of 32 threads (a single warp), and assigned to 32 cores for the purpose of the execution of a single instruction. But a SIMD is a subthreading term, meaning a single core should have a vector register, and be able to handle several chunks of data in the context of a single thread. This is not possible on a GPU core (each core handles a separate thread in a scalar manner).

Does it mean that a simd directive can't be translated for a GPU?

Or maybe - each thread is handled as if it had a single SIMD lane?

Or maybe - the SIMD iterations are spread across entire warp of 32 threads (but how about memory access then?) ?

Marc Andreson
  • 3,405
  • 5
  • 35
  • 51

0 Answers0