Bank conflict in 2D kernel

Question

Suppose our hardware has 32 banks of 4 byte width. And we have a 1D kernel of size 32, and a local 1D array of ints.

Then, ensuring that each consecutive thread accesses consecutive memory locations in the array should avoid bank conflicts.

But, suppose we have an 8 x 4 2D kernel and the same 1D array. How can I ensure that there are no bank conflicts? How do we define "consecutive thread" for a 2D array?

Thanks, Cicada. Do you mean get_local_id(0) + get_local_id(1) * 4 ? — Jacko, Oct 22 '14 at 18:33
[This post](http://stackoverflow.com/questions/6177202/how-are-threads-divided-into-warps-cuda) and its answer can help to find out how consecutive threads in a 2D work-group are defined. — Farzad, Oct 22 '14 at 19:57

score 1 · Accepted Answer · answered Oct 22 '14 at 19:32

1

You can get the same global work-item IDs that you get in the 1D case with get_global_id(0) in the 2D case with this code:

get_global_id(1) * get_global_size(0) + get_global_id(0);

Just change the globals to locals if you want to get local work-item ID within a work-group.

answered Oct 22 '14 at 19:32

maZZZu

3,585
2
17
21

Thanks, bl0z0. Is this documented anywhere? – Jacko Oct 22 '14 at 19:56
1

I have figured it out myself, because I have worked with lot of images stored as 1D arrays. But you can find it also from the link Fazard gave you. – maZZZu Oct 22 '14 at 20:13
1

IIRC it's somewhere in `§3.2 Execution Model` – user703016 Oct 22 '14 at 20:14

Bank conflict in 2D kernel

1 Answers1