I am doing a detailed code analysis for which I want to measure the total number of bank conflicts per warp.
The nvvp
documentation lists this metric, which was the only one I could find related to bank conflicts:
shared_replay_overhead: Average number of replays due to shared memory conflicts for each instruction executed
When I profile the metric using nvprof
(or nvvp
) I get a result like this:
Invocations Metric Name Metric Description Min Max Avg
Device "Tesla K20m (0)"
Kernel: void matrixMulCUDA<int=32>(float*, float*, float*, int, int)
301 shared_replay_overhead Shared Memory Replay Overhead 0.089730 0.089730 0.089730
I need to utilize this value 0.089730
or devise some other method to arrive at a measurement of number of bank conflicts.
I understand that this value is the 'average' taken across all the warps that are executing. If I had to measure the total number of bank conflicts per warp, is there a way to do it using the nvprof
results?
Possible approaches that came to my mind:
- By using
shared_replay_overhead
results and using them in a formula to calculate the number of bank conflicts. I am guessing I have to apply some sort of formula likeshared_replay_overhead * Total number of warps launched
where I know theTotal number of warps launched
in advance, but I can't figure out what. - By first detecting that it's a four-way bank conflict, eight-way bank conflict, etc, and then multiplying
4
/8
by the number of times the shared memory operation takes place (how to measure that?).
This probably requires a fairly good technical knowledge about the GPU architecture as well, in addition to nvprof
results, which I don't think I have yet. For the record, my GPU is of Kepler architecture, SM 3.5.
Even if I can measure the number of bank conflicts per block instead of per warp, it will suffice. After that I can do the necessary calculations to get the value on a per-warp basis.