2D textures are a useful feature of CUDA in image processing applications. To bind pitch linear memory to 2D textures, the memory has to be aligned. cudaMallocPitch
is a good option for aligned memory allocation. On my device, the pitch returned by cudaMallocPitch
is a multiple of 512, i.e the memory is 512 byte aligned.
The actual alignment requirement for the device is determined by cudaDeviceProp::texturePitchAlignment
which is 32 bytes on my device.
My question is:
If the actual alignment requirement for 2D textures is 32 bytes, then why does cudaMallocPitch
return 512 byte aligned memory?
Isn't it a waste of memory? For example if I create an 8 bit image of size 513 x 100, it will occupy 1024 x 100 bytes.
I get this behaviour on following systems:
1: Asus G53JW + Windows 8 x64 + GeForce GTX 460M + CUDA 5 + Core i7 740QM + 4GB RAM
2: Dell Inspiron N5110 + Windows 7 x64 + GeForce GT525M + CUDA 4.2 + Corei7 2630QM + 6GB RAM