Multiplication of two cudaArray in kernel?(using texture memory)

Question

I have two cudaArray, a1 and a2 (which have the same size) which reprensent two matrices .

Using texture memory, I want to multiplicate those two cudaArrays . Then I want to copy back the result in one normal arrays,let's name it *a1_h.

The fact is, I just don't know how to do it . I've managed to define, allocate my two cudaArrays and to put floats into them .

Now I want to do a kernel which does those multiplications .

Can somebody help me ?

ROOM_X and ROOM_Y are int, they define width and height of matrices . mytex_M1 and mytex_M2 are texture defined as : texture < float,2,cudaReadModeElementType > .

Here is my main :

int main(int argc, char * argv[]) {

    int size = ROOM_X * ROOM_Y * sizeof(float);

    //creation of arrays on host.Will be useful for filling the cudaArrays
    float *M1_h, *M2_h;

//allocating memories on Host
    M1_h = (float *)malloc(size);
    M2_h = (float *)malloc(size);

//creation of  channel descriptions for 2d texture
cudaChannelFormatDesc channelDesc_M1 = cudaCreateChannelDesc<float>();
cudaChannelFormatDesc channelDesc_M2 = cudaCreateChannelDesc<float>();

//creation of 2 cudaArray * . 
cudaArray *M1_array,*M2_array;

//bind arrays and channel in order to allocate space
cudaMallocArray(&M1_array,&channelDesc_M1,ROOM_X,ROOM_Y);
cudaMallocArray(&M2_array,&channelDesc_M2,ROOM_X,ROOM_Y);

//filling the matrices on host
Matrix(M1_h);
Matrix(M2_h);

//copy from host to device (putting the initial values of M1 and M2 into the arrays)
 cudaMemcpyToArray(M1_array, 0, 0,M1_h, size,cudaMemcpyHostToDevice);
 cudaMemcpyToArray(M2_array, 0, 0,M2_h, size,cudaMemcpyHostToDevice);

//set textures parameters 
mytex_M1.addressMode[0] = cudaAddressModeWrap;
mytex_M1.addressMode[1] = cudaAddressModeWrap;
mytex_M1.filterMode = cudaFilterModeLinear;
mytex_M1.normalized = true; //NB coordinates in [0,1]

mytex_M2.addressMode[0] = cudaAddressModeWrap;
mytex_M2.addressMode[1] = cudaAddressModeWrap;
mytex_M2.filterMode = cudaFilterModeLinear;
mytex_M2.normalized = true; //NB coordinates in [0,1]

//bind arrays to the textures 
cudaBindTextureToArray(mytex_M1,M1_array);  
cudaBindTextureToArray(mytex_M2,M2_array);

//allocate device memory for result
float* M1_d;
cudaMalloc( (void**)&M1_d, size);

//dimensions of grid and blocks
dim3 dimGrid(ROOM_X,ROOM_Y);
dim3 dimBlock(1,1);

//execution of the kernel . The result of the multiplication has to be put in M1_d
mul_texture<<<dimGrid, dimBlock >>>(M1_d);

//copy result from device to host
cudaMemcpy(M1_h,M1_d, size, cudaMemcpyDeviceToHost);


//free memory on device
cudaFreeArray(M1_array);
cudaFreeArray(M2_array);
cudaFree(M1_d);

//free memory on host
free(M1_h);
free(M2_h);

return 0;
}

Take a look at [this answer](http://stackoverflow.com/a/16900821/1387612) — janisz, Dec 03 '14 at 11:20
I don't understand this solution . It seems that it's not what I meant (if it is, I'll try to understand it better oO) I want to use texture memory . Actually, when I have defined the arrays, I created two textures, than two cudaChannel, I used cudaMallocArray, than I filled the cudaArray using cudaMemcpytoArray . But I don't know how to make the kernel who is going to do the multiplication — Ptit Sualty, Dec 03 '14 at 11:32
Yeah you're right . I confused myself . I'm going to edit the first post right now . I want to put the result of multiplication in one simple array . — Ptit Sualty, Dec 03 '14 at 12:26
Texture has a global scope, so you dont need to pass them as arguments. Could you show the code where you define and itinialize the textures and point where are you stuck? — pQB, Dec 03 '14 at 12:36
ok I've done it . In fact I said it not right, the cudaArrays represent matrices and I want to multiply those two matrices and put the result in some float * . I think that I manage to define all the stuff . But I can't figure how the kernel will work . I think that just a pseudo-code could help me a bit :/ — Ptit Sualty, Dec 03 '14 at 13:13
So, you are looking for `M1_d[i][j] = mytex_M1[i][j] * mytex_M2[i][j]`? — pQB, Dec 03 '14 at 13:19

score 1 · Accepted Answer · edited May 23 '17 at 12:06

When you declare a texture

A texture reference can only be declared as a static global variable and cannot be passed as an argument to a function. http://docs.nvidia.com/cuda/cuda-c-programming-guide/#texture-reference-api

So, if you have successfully define the texture references, initialize the arrays, copy then to the texture space and prepare the output buffers (something that seems to be done according to your code), what you need to do is implement the kernel. For example:

__global__ void
mul_texture(float* M1_d, int w, int h)
{
    // map from threadIdx/BlockIdx to pixel position
    int x = threadIdx.x + blockIdx.x * blockDim.x;
    int y = threadIdx.y + blockIdx.y * blockDim.y;

    // take care of the size of the image, it's a good practice
    if ( x < w && y < h )
    {
        // the output M1_d is actually represented as 1D array
        // so the offset of each value is related to their (x,y) position
        // in a tow-major order
        int gid = x + y * w;

        // As texture are declared at global scope,
        // we can access their content at any kernel
        float M1_value = tex2D(mytex_M1,x,y);
        float M2_value = tex2D(mytex_M2,x,y);

        // The final results is the pointwise multiplication
        M1_d[ gid ] = M1_value * M2_value;
    }
}

You need to change the kernel invocation to include the w and h values, corresponding to the width (number of columns in the matrix) and height (number of rows of the matrix).

mul_texture<<<dimGrid, dimBlock >>>(M1_d, ROOM_X, ROOM_Y);

Note you are not doing any error checking, something that will help you quite a lot now and in the future. I have not checked if the kernel provided in this answer works as your code didnt compile.

Many thanks, it seems more clearly now . By reading your code, I'm just wondering, what are M1_value and M2_value? float* ? I'm going to try to fix all my bugs . In fact, I've checked errors, but I've deleting them for this post because there were too many code :/ — Ptit Sualty, Dec 03 '14 at 13:48
Yes, they are float (not float*), I totally forgot the data type. I will update the answer. — pQB, Dec 03 '14 at 13:51
ok, thank you . If I've more questions about this stuff, I will post again here . — Ptit Sualty, Dec 03 '14 at 13:53
Dear @PtitSualty, if the answer was helpful and it solved your question, I would appreciate if you accept it. — pQB, Dec 12 '14 at 17:27

Multiplication of two cudaArray in kernel?(using texture memory)

1 Answers1