How is omp simd for loop executed on GPUs?

Asked Feb 02 '15 at 18:59

Active Feb 15 '15 at 06:35

Viewed 326 times

How is the #pragma omp simd directive translated for a GPU target device?

GPU's cores are handling a separate thread each. Threads are combined in groups of 32 threads (a single warp), and assigned to 32 cores for the purpose of the execution of a single instruction. But a SIMD is a subthreading term, meaning a single core should have a vector register, and be able to handle several chunks of data in the context of a single thread. This is not possible on a GPU core (each core handles a separate thread in a scalar manner).

Does it mean that a simd directive can't be translated for a GPU?

Or maybe - each thread is handled as if it had a single SIMD lane?

Or maybe - the SIMD iterations are spread across entire warp of 32 threads (but how about memory access then?) ?

edited Feb 15 '15 at 06:34

asked Feb 02 '15 at 18:59

Marc Andreson

3,405
5
35
51

2

Have you actually used OpenMP with a GPU? – Z boson Feb 03 '15 at 10:16
@Zboson: I'm trying to and having a lot of difficulty figuring out how to use OpenMP to manage the many levels of parallelism that are possible. – Richard Dec 24 '17 at 06:48

How is omp simd for loop executed on GPUs?

0 Answers0

Linked