CUDA index blockDim.y is always 1

Question

I'm trying to solve the 2D Laplace equation with shared memory. But one strange thing is that the blockDim.y value is always 1.Could someone help me?

host code

checkCudaErrors(cudaMalloc((void**)&d_A, h*h * sizeof(float)));
checkCudaErrors(cudaMalloc((void**)&d_out, h*h * sizeof(float)));
checkCudaErrors(cudaMemcpy(d_A, A, h*h * sizeof(float), cudaMemcpyHostToDevice));
dim3 blockSize = (BLOCK_SIZE, BLOCK_SIZE);
dim3 gridSize = ((h+BLOCK_SIZE-1)/BLOCK_SIZE, (h + BLOCK_SIZE - 1) / BLOCK_SIZE);

LaplaceDifference << <gridSize, blockSize >> > (d_A, h, d_out);
checkCudaErrors(cudaMemcpy(B, d_out, h*h * sizeof(float), cudaMemcpyDeviceToHost));

kernel code

int idx = blockIdx.x*blockDim.x + threadIdx.x;
int idy = blockIdx.y*blockDim.y + threadIdx.y;


__shared__ float A_ds[BLOCK_SIZE + 2][BLOCK_SIZE + 2];

int n = 1;
//Load data in shared memory
int halo_index_left = (blockIdx.x - 1)*blockDim.x + threadIdx.x;
int halo_index_right = (blockIdx.x + 1)*blockDim.x + threadIdx.x;
int halo_index_up = (blockIdx.y - 1)*blockDim.y + threadIdx.y;
int halo_index_down = (blockIdx.y + 1)*blockDim.y + threadIdx.y;

A_ds[n + threadIdx.y][n + threadIdx.x] = A[idy * h +idx];

if (threadIdx.x >= blockDim.x - n) {
    A_ds[threadIdx.y + n][threadIdx.x - (blockDim.x - n)] = (halo_index_left < 0) ? 0 : A[idy*h + halo_index_left];
}
if (threadIdx.x < n) {
    A_ds[threadIdx.y + n][blockDim.x + n + threadIdx.x] = (halo_index_right >= h) ? 0 : A[idy*h + halo_index_right];
}
if (threadIdx.y >= blockDim.y - n) {
    A_ds[threadIdx.y - (blockDim.y - n)][threadIdx.x+n] = (halo_index_up < 0) ? 0 : A[halo_index_up*h + idx];
}
if (threadIdx.y < n) {
    A_ds[blockDim.y + n + threadIdx.y][threadIdx.x + n] = (halo_index_down >= h) ? 0 : A[halo_index_down*h + idx];
}


__syncthreads();

P[idy*h + idx] = 0.25*(A_ds[threadIdx.y + n - 1][threadIdx.x + n] + A_ds[threadIdx.y + n + 1][threadIdx.x + n] + A_ds[threadIdx.y + n][threadIdx.x + n - 1] + A_ds[threadIdx.y + n][threadIdx.x + n + 1]);

You are only initialising one dimension of blockSize. What did you think would happen? — talonmies, Jul 21 '19 at 11:47
Possible duplicate of [How does the Comma Operator work](https://stackoverflow.com/questions/54142/how-does-the-comma-operator-work) — tera, Jul 21 '19 at 12:11

score 2 · Answer 1 · answered Jul 22 '19 at 22:20

(I spent quite some time looking for a dupe, but could not find it.)

A dim3 variable is a particular data type defined in the CUDA header file vector_types.h.

It provides several constructors. Here are a couple valid uses of constructors for this variable:

dim3 grid(gx, gy, gz);

dim3 grid = dim3(gx, gy, gz);

What you have shown:

dim3 blockSize = (BLOCK_SIZE, BLOCK_SIZE);

won't work the way you expect.

Since there is no dim3 usage on the right hand side of the equal sign, the compiler will use some other method to process what is there. It is not a syntax error, because both the use of parentheses and the comma are legal in this form, from a C++ language perspective.

Hopefully you understand how parentheses work in C++. I'm not going to try to describe the comma operator, you can read about it here and here. The net effect is that the compiler will evaluate each of the two expressions (one on the left of the comma, one on the right) and it will evaluate the overall expression value as the value produced by the evaluation of the expression on the right. So this:

(BLOCK_SIZE, BLOCK_SIZE)

becomes this:

BLOCK_SIZE

which is quite obviously a scalar quantity, not multi-dimensional.

When you assign a scalar to a dim3 variable:

dim3 blockSize = BLOCK_SIZE;

You end up with a dim3 variable that has these dimensions:

(BLOCK_SIZE, 1, 1)

One method to fix what you have is as follows:

dim3 blockSize = dim3(BLOCK_SIZE, BLOCK_SIZE);
                 ^^^^

score -2 · Answer 2 · answered Jul 22 '19 at 04:04

-2

This line:

dim3 blockSize = (BLOCK_SIZE, BLOCK_SIZE);

initializes a 1D block size. What you want is:

dim3 blockSize(BLOCK_SIZE, BLOCK_SIZE);

answered Jul 22 '19 at 04:04

Michael

2,344
6
12

1

I think any half decent answer would need to explain *why* the initialization doesn't work, no just regurgitate some of the comments into an answer – talonmies Jul 22 '19 at 06:04
1

Emphasis on _some_. The comments on the questions already provided a better reply than your "answer" when you copy-pasted it. – tera Jul 22 '19 at 20:03
I did not copy-paste from the comments, I simply answered the question. If you don't like my answer, post your own – Michael Jul 23 '19 at 01:02

CUDA index blockDim.y is always 1

host code

kernel code

2 Answers2