By the link is written: https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf
1.1. AT A GLANCE
1.1.1. MPS
The Multi-Process Service (MPS) is an alternative, binary-compatible implementation of the CUDA Application Programming Interface (API). The MPS runtime architecture is designed to transparently enable co-operative multi-process CUDA applications, typically MPI jobs, to utilize Hyper-Q capabilities on the latest NVIDIA (Kepler-based) Tesla and Quadro GPUs. Hyper-Q allows CUDA kernels to be processed concurrently on the same GPU; this can benefit performance when the GPU compute capacity is underutilized by a single application process.
Do I have to use the MPS (MULTI-PROCESS SERVICE) when using CUDA6.5 + MPI (OpenMPI / IntelMPI), or can I not use MPS with lost some performance but without any errors?
If I will not use MPS, does it mean that all my MPI-processes on a single server will execute their GPU-kernel-functions sequentially (not concurrent) on a single GPU-card, but all other behavior will stay the same?