1

The documentation of cudaMalloc3D says

The returned cudaPitchedPtr contains additional fields xsize and ysize, the logical width and height of the allocation, which are equivalent to the width and height extent parameters provided by the programmer during allocation.

However, if I run the following minimum example

#include<stdio.h>
#include<cuda.h>
#include<cuda_runtime.h>
#include<device_launch_parameters.h>
#include<conio.h>

#define Nrows 64
#define Ncols 64
#define Nslices 16

/********************/
/* CUDA ERROR CHECK */
/********************/
// --- Credit to http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api
void gpuAssert(cudaError_t code, char *file, int line, bool abort = true)
{
    if (code != cudaSuccess)
    {
        fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
        if (abort) { exit(code); }
    }
}

void gpuErrchk(cudaError_t ans) { gpuAssert((ans), __FILE__, __LINE__); }

/********/
/* MAIN */
/********/
int main() {

    // --- 3D pitched allocation and host->device memcopy
    cudaExtent extent = make_cudaExtent(Ncols * sizeof(float), Nrows, Nslices);
    cudaPitchedPtr devPitchedPtr;
    gpuErrchk(cudaMalloc3D(&devPitchedPtr, extent));

    printf("xsize = %i; xsize in bytes = %i; ysize = %i\n", devPitchedPtr.xsize, devPitchedPtr.pitch, devPitchedPtr.ysize);

    return 0;
}

I receive:

xsize = 256; xsize in bytes = 512; ysize = 64

So, ysize is actually equal to Nrows, but xsize is different from either Ncols or xsize in bytes / sizeof(float).

Could you please help me understanding the meaning of the xsizeand ysize fields in the cudaPitchedPtr of cudaMalloc3D?

Thank you very much in advance for any help.

My system: Windows 10, CUDA 8.0, GT 920M, cc 3.5.

Vitality
  • 20,705
  • 4
  • 108
  • 146
  • 1
    xsize is the pitch width you requested in bytes. pitch is the actual pitch width in bytes. ysize is the number of rows you requested – talonmies May 08 '17 at 19:20
  • Not the sentences "Allocates *at least* width * height * depth bytes of linear memory" and "The function *may pad* the allocation..." in the doc. – BlameTheBits May 08 '17 at 19:24
  • @talonmies Thank you very much for your prompt comment. – Vitality May 08 '17 at 20:53

2 Answers2

2

xsize = Ncols * sizeof(float)

xsize is the logical width (in bytes) of the allocation, as opposed to the pitched width

logical width = 256 bytes

pitched width = 512 bytes

It is equivalent (identical) to the width parameter you provided during allocation (i.e. the first parameter you passed to make_cudaExtent)

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Thank you Robert for your prompt answer. Now it is clear to me that `xsize` is the number of columns "measured" in `bytes`. – Vitality May 08 '17 at 20:49
1

A very related and working example to this question (@JackOLantern your own answer in another post) is here that shows how to use cudaMalloc3D and etc.

I have learnt a rule of thumb that somehow answers this question and I want to share it with you: "In the context of of CUDA library, unless we are working with cudaArrays, width means nCols * sizeof(datatype) in bytes and pitch means width + 0 or width + some padding(depending on the size of the array and GPU hardware) in bytes."

PS. When working with CUDA arrays, we define width in terms of number of elements(not number of bytes) in a row(nCols). That's because CUDA Arrays take care of internal memory layout and we don't need to provide width in terms of number of bytes.

Mohsen
  • 153
  • 11