CUDA C++ Pointer Typecasting

Question

I was looking at CUDA C++ documentation. But there is something I didn't get about pointer typecasting. Below there are host and device code.

// Host code
int width = 64, height = 64;
float* devPtr;
size_t pitch;
cudaMallocPitch(&devPtr, &pitch,
                width * sizeof(float), height);
MyKernel<<<100, 512>>>(devPtr, pitch, width, height);

// Device code
__global__ void MyKernel(float* devPtr,
                         size_t pitch, int width, int height)
{
    for (int r = 0; r < height; ++r) {
        float* row = (float*)((char*)devPtr + r * pitch);
        for (int c = 0; c < width; ++c) {
            float element = row[c];
        }
    }
}

As you can see devPtr is typecasted into char. But I didn't get why typecasted into char rather than incrementing as float type.

do you know what it means to add an integer to a pointer ? – 463035818_is_not_an_ai Dec 08 '22 at 20:38 — 463035818_is_not_an_ai, Dec 08 '22 at 20:38
yes. @463035818_is_not_a_number – tekiz Dec 08 '22 at 20:43 — tekiz, Dec 08 '22 at 20:43

Robert Crovella · Accepted Answer · 2022-12-09T19:03:13.713

This is to handle a pitched allocation (the type created by cudaMallocPitch()).

A pitched allocation "rounds up" the requested width of the allocation to a particular pitch, and this pitch is specified in bytes:

cudaMallocPitch(&devPtr, &pitch,
                          ^
                          | 
               this value is indicated by the function as a row width or "pitch" in bytes

Because the pitch is specified in bytes, to get proper pointer arithmetic:

((char*)devPtr + r * pitch);
               ^
               |
           pointer arithmetic

the pointer type must also be a byte-type. The objective of that code snippet is to increment devPtr by a number of rows specified by r, each row consists of pitch bytes.

AFAIK, in CUDA, there is nothing that guarantees any particular granularity of pitch as returned by cudaMallocPitch. It is theoretically possible for it to be an odd number of bytes, or a prime number of bytes, for example. So playing tricks to pre-convert the pitch value to an equivalent (pointer arithmetic) offset in other type-widths would be frowned on.

CUDA C++ Pointer Typecasting

1 Answers1