Nvidia introduced a new Independent Thread Scheduling for their GPGPUs since Volta. In case CUDA threads diverge, alternative code paths are not executed in blocks but instruction-wise. Still, divergent paths can not be executed at the same time since the GPUs are SIMT as well. This is the original article:
https://developer.nvidia.com/blog/inside-volta/ (scroll down to "Independent Thread Scheduling").
I understood what this means. What I don't understand is, in which way this new behavoir accelerates code. Even the before/after diagrams in the above article do not reflect an overall speed-up.
My question: Which kinds of divergent algorithms will run faster on Volta (and newer) due to the described new scheduling?