So I found this wikipedia resource
Maximum number of resident grids per device (Concurrent Kernel Execution)
and for each compute capability it says a number of concurrent kernels, which I assume to be the maximum number of concurrent kernels.
Now I am getting a GTX 1060 delivered which according to this nvidia CUDA resource has a compute capability of 6.1. From what I have learned about CUDA so far you can specify the virtual compute capability of your code at compile time in NVCC though with the flag -arch=compute_XX
.
So will my GPU be hardware constrained to 32 concurrent kernels or is it capable of 128 with the -arch=compute_60
flag?