CUDA: argument of type float * is incompatible with parameter of type float (*)[32768]

Question

On passing my pointer to point to the array, I am getting the foll error: argument of type "float " is incompatible with parameter of type "float ()[32768]

relevant Snippets of my code are:

#define N  32768

__global__ void op(float k_a[][N])
{
 //some operation
}
float *ptr_a=(float*)malloc(N*N*sizeof(float));

float *d_ptr_a;cudaMalloc((void**)&d_ptr_a,N*N*sizeof(float));

cudaMemcpy(d_ptr_a,ptr_a,N*N*sizeof(float),cudaMemcpyHostToDevice);

op<<<nblocks,nthreadsperblock>>>(d_ptr_a)

Can some tell me whats going wrong? I am a beginner to CUDA.

The error message is very explicit -- the type you are passing as an argument is not the same as the type the kernel requires. — talonmies, Feb 19 '17 at 08:00
I should add, this has nothing to do with CUDA. If you just define a regular `void op(float k_a[][N])` and pass a `float*` argument to it, you get the same (or similar) error message. Clang gives me `error: cannot convert ‘float*’ to ‘float (*)[32768]’ for argument ‘1’ to ‘void op(float (*)[32768])’` — CygnusX1, Feb 19 '17 at 08:03
And why `k_a` has to be this "odd" thing? Will `float *` not work? — Dori, Feb 19 '17 at 08:04
If i use float*, then I am bit confused on how to access the 2 dimensions of array. — darkfall94, Feb 19 '17 at 08:11
Have a read http://stackoverflow.com/questions/42094465/correctly-allocating-multi-dimensional-arrays — StoryTeller - Unslander Monica, Feb 19 '17 at 08:13
@CygnusX1: Everyone's life will be much easier if you don't tag this with C (which is why I removed it). Otherwise a certain troll from the C tag will appear from under his bridge and start a flame war in comments about "CUDA is not C", as he has done many times before — talonmies, Feb 19 '17 at 08:20
@talonmines: The actual problem in this question is related to the use of C multi-dimentional arrays. I expect C people may have more experience with that than CUDA people. That's why I thought 'c' tag may be beneficial. The link by StoryTeller for example - adds to the discussion but has nothing to do with cuda. — CygnusX1, Feb 19 '17 at 08:22
I fully understand the rationale, but this is the reality we live in. Self censor, or enjoy a flame war every time C and CUDA are mentioned in the same breath — talonmies, Feb 19 '17 at 09:19
Although the answer suggests flattening/simulated 2D access (which is fine), in this particular case where the array width is known at compile time, I think this (what you are trying to express in your kernel prototype) is a perfect way to handle it with a slight modification to the types in use. [This answer](http://stackoverflow.com/questions/41050300/how-do-i-allocate-memory-and-copy-2d-arrays-between-cpu-gpu-in-cuda-without-fl/41053215#41053215) outlines a worked example of what I have in mind. — Robert Crovella, Feb 19 '17 at 11:16
@RobertCrovella What if the array width was not known at the compile time? How would we handle it in that case? — darkfall94, Feb 19 '17 at 18:17
In that case I would strongly suggest the flattening approach given in the answer. It is simplest to implement and avoids pointer chasing. If you really want to do doubly-subscripted access in the kernel code when the array width is not known at compile time, then the [cuda tag info page](http://stackoverflow.com/tags/cuda/info) has a canonical question ("arrays of pointers") handling that case, and there are literally dozens of similar questions here on the CUDA tag. — Robert Crovella, Feb 19 '17 at 20:24

Dori · Accepted Answer · 2017-02-19T10:40:59.873

What we have here is an incorrect use of pointers and arrays. According to the definition of the routine op it requires a pointer to array of length N of type float. On the other hand when it is called a pointer to type float is provided as an argument, thus resulting in an error message of types not being compatible.

If op routine is to tackle with a 2D array of float values, it is better to have its argument be a pointer to float. In such a case all of the M "rows" of the original 2D array, accessed with an index y, sit in the memory one after another, in a single line, like that:

[row0][row1][row2]...[rowM-1]

Every such row contains N "columns", accessed by index x. To get a value for some pair (x, y) we have to produce a global index that will access the 1D array:

index = y * N + x;

Simple as that.

"*it requires **array of arrays of length N***" not quite right. It requires a *pointer* to an array of length N, which, as you mention, is probably expected to be followed by other array of the same type, that is it is expected to *point* to the 1st elements of an "*array of arrays of length N*". This reveals a conceptual issue, namely that the function in question (`op()` here) *does **not** know the second dimension*! — alk, Feb 19 '17 at 10:25

CUDA: argument of type float * is incompatible with parameter of type float (*)[32768]

1 Answers1