I have a CUDA kernel that I was benchmarking, and the Global Memory Cache Replay showed as 216.9%
This doesn't quite make sense to me. The only way I can see cache misses happening over 100% is if it is missing on multiple cache levels, but this doesn't seem like that should be the case here.
Any insight as to why this is the case?