I have already been using the PGI compiler supporting OpenAcc for launching the code on GPU for about 3 years, but i can not understand the difference between the terms "kernels" and "parallel" so far. I read in OpenAcc Getting Started Guide:
Parallel Construct
Defines the region of the program that should be compiled for parallel execution on the accelerator device.
Kernels Construct
Defines the region of the program that should be compiled into a sequence of kernels for execution on the accelerator device.
I do not understand what is the difference between the terms "parallel execution on the accelerator device" and "compiled into a sequence of kernels for execution on the accelerator device". If the accelerator device is a GPU, then all the code is compiled into CUDA kernels of some size (i try to mean CUDA grid and blocks) and these CUDA kernels are executed on GPU in CUDA threads, isn't it? What is a "sequence" of kernels? The "parallel" directive makes 1 kernel and "kernels" can make a sequence of kernels from the same piece of code?
Also i use only the "parallel" loop directive everywhere. For example, to parallelize a for loop for execution on GPU, i write
#pragma acc parallel loop gang vector copy(...) present(...)
for(int i=0; i<N; ++i)
{
...
}
Is it correct? When "kernels" should be used? Or it is a synonym for "parallel" and now is deprecated and not used?