I have a little confusion about bank conflicts, avoiding them using memory padding and coalesced memory access. What I've read so far: Coalesced memory access from global memory is optimal. If it isn't achievable shared memory might be used to reorder the data needed by the current block and thus making coalesced access possible. However when using shared memory one has to look out for bank conflicts. One strategy to avoid bank conflicts is to pad the arrays stored in shared memory by 1. Consider the example from this blog post where each row of a 16x16 matrix is padded by 1 making it a 16x17 matrix in shared memory.
Now I understand that using memory padding might avoid bank conflicts but doesn't that also mean the memory is not aligned anymore? E.g. if I shift global memory by 1 thus misaligning it one warp would need to access two memory lanes instead of one because of the one last number not being in the same lane as all other numbers. So for my understanding coalesced memory access and memory padding are contradicting concepts, aren't they? Some clarification is appreciated very much!