According to this link https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html:
Warp was stalled waiting for the MIO (memory input/output) instruction queue to be not full. This stall reason is high in cases of extreme utilization of the MIO pipelines, which include special math instructions, dynamic branches, as well as shared memory instructions.
And according to this one https://docs.nvidia.com/drive/drive_os_5.1.12.0L/nsight-graphics/activities/index.html:
May be triggered by local, global, shared, attribute, IPA, indexed constant loads (LDC), and decoupled math.
My understanding is that all memory operations are executed on LSUs, so I would imagine that they are stored on the same instruction queue together and then executed by the LSU unit. Since they are all queued together, the second interpretation (which includes global memory accesses) makes more sense to me. The problem is that if that's the case, LG Throttle would be unnecessary.
What does MIO Throttle actually imply? Are all memory instructions stored on the same queue?