0

On passing my pointer to point to the array, I am getting the foll error: argument of type "float " is incompatible with parameter of type "float ()[32768]

relevant Snippets of my code are:

#define N  32768

__global__ void op(float k_a[][N])
{
 //some operation
}
float *ptr_a=(float*)malloc(N*N*sizeof(float));

float *d_ptr_a;cudaMalloc((void**)&d_ptr_a,N*N*sizeof(float));

cudaMemcpy(d_ptr_a,ptr_a,N*N*sizeof(float),cudaMemcpyHostToDevice);

op<<<nblocks,nthreadsperblock>>>(d_ptr_a)

Can some tell me whats going wrong? I am a beginner to CUDA.

talonmies
  • 70,661
  • 34
  • 192
  • 269
darkfall94
  • 13
  • 2
  • 9
  • The error message is very explicit -- the type you are passing as an argument is not the same as the type the kernel requires. – talonmies Feb 19 '17 at 08:00
  • 2
    I should add, this has nothing to do with CUDA. If you just define a regular `void op(float k_a[][N])` and pass a `float*` argument to it, you get the same (or similar) error message. Clang gives me `error: cannot convert ‘float*’ to ‘float (*)[32768]’ for argument ‘1’ to ‘void op(float (*)[32768])’` – CygnusX1 Feb 19 '17 at 08:03
  • And why `k_a` has to be this "odd" thing? Will `float *` not work? – Dori Feb 19 '17 at 08:04
  • 1
    If i use float*, then I am bit confused on how to access the 2 dimensions of array. – darkfall94 Feb 19 '17 at 08:11
  • 2
    Have a read http://stackoverflow.com/questions/42094465/correctly-allocating-multi-dimensional-arrays – StoryTeller - Unslander Monica Feb 19 '17 at 08:13
  • 2
    @CygnusX1: Everyone's life will be much easier if you don't tag this with C (which is why I removed it). Otherwise a certain troll from the C tag will appear from under his bridge and start a flame war in comments about "CUDA is not C", as he has done many times before – talonmies Feb 19 '17 at 08:20
  • @talonmines: The actual problem in this question is related to the use of C multi-dimentional arrays. I expect C people may have more experience with that than CUDA people. That's why I thought 'c' tag may be beneficial. The link by StoryTeller for example - adds to the discussion but has nothing to do with cuda. – CygnusX1 Feb 19 '17 at 08:22
  • 1
    I fully understand the rationale, but this is the reality we live in. Self censor, or enjoy a flame war every time C and CUDA are mentioned in the same breath – talonmies Feb 19 '17 at 09:19
  • 1
    Although the answer suggests flattening/simulated 2D access (which is fine), in this particular case where the array width is known at compile time, I think this (what you are trying to express in your kernel prototype) is a perfect way to handle it with a slight modification to the types in use. [This answer](http://stackoverflow.com/questions/41050300/how-do-i-allocate-memory-and-copy-2d-arrays-between-cpu-gpu-in-cuda-without-fl/41053215#41053215) outlines a worked example of what I have in mind. – Robert Crovella Feb 19 '17 at 11:16
  • @RobertCrovella What if the array width was not known at the compile time? How would we handle it in that case? – darkfall94 Feb 19 '17 at 18:17
  • 1
    In that case I would strongly suggest the flattening approach given in the answer. It is simplest to implement and avoids pointer chasing. If you really want to do doubly-subscripted access in the kernel code when the array width is not known at compile time, then the [cuda tag info page](http://stackoverflow.com/tags/cuda/info) has a canonical question ("arrays of pointers") handling that case, and there are literally dozens of similar questions here on the CUDA tag. – Robert Crovella Feb 19 '17 at 20:24

1 Answers1

4

What we have here is an incorrect use of pointers and arrays. According to the definition of the routine op it requires a pointer to array of length N of type float. On the other hand when it is called a pointer to type float is provided as an argument, thus resulting in an error message of types not being compatible.

If op routine is to tackle with a 2D array of float values, it is better to have its argument be a pointer to float. In such a case all of the M "rows" of the original 2D array, accessed with an index y, sit in the memory one after another, in a single line, like that:

[row0][row1][row2]...[rowM-1]

Every such row contains N "columns", accessed by index x. To get a value for some pair (x, y) we have to produce a global index that will access the 1D array:

index = y * N + x;

Simple as that.

Dori
  • 675
  • 1
  • 7
  • 26
  • 1
    "*it requires **array of arrays of length N***" not quite right. It requires a *pointer* to an array of length N, which, as you mention, is probably expected to be followed by other array of the same type, that is it is expected to *point* to the 1st elements of an "*array of arrays of length N*". This reveals a conceptual issue, namely that the function in question (`op()` here) *does **not** know the second dimension*! – alk Feb 19 '17 at 10:25