The cudaMalloc() documentation says
The allocated memory is suitably aligned for any kind of variable.
But...
- What affects the actual aligment? Compute capability? CUDA driver version? The specific kind of card? The allocation size?
- Can I determine the minimum / typical allocation alignment as a function of these parameters?