I would like to be sure that I correctly understand bank conflicts in shared memory.
I have 32 segments of data.
These segments consist of 128 integers each.
[[0, 1, ..., 126, 127], [128, 129, ..., 255], ..., [3968, 3969, ..., 4095]]
Each thread in a warp accesses only its own portion.
Thread 0 accesses position 0 of portion 0 corresponding to index 0.
Thread 1 accesses position 0 on portion 1 corresponding to index 128.
...
Thread 31 accesses position 0 of portion 31 corresponding to index 3968.
Does it mean that I have a 32-fold bank conflict?
If yes, then if I add one element of padding to each segment (i.e. 129 elements total), then each thread will access a unique bank. Am I right?