Threads interaction with global memory

Question

I'm trying to understand coding with GPUs. I'm having a bit trouble understanding interaction of GPU threads with global memory.

I know when we allocate or copy something on device it's done by reference. But when we try to access global memory in a thread, what exactly happens? Do all threads try to access memory at the same time leading to serial execution or do they all make their own copies or something else?

Any help will be much appreciated.

Not a duplicate but you might also read [here](http://stackoverflow.com/q/15029765/1938163) — Marco A., Jan 18 '16 at 13:18

score 1 · Accepted Answer · answered Jan 18 '16 at 13:28

Do all threads try to access memory at the same time leading to serial execution or do they all make their own copies or something else?

No if you want to do computations in parallel. For instance, to add an array in parallel you would do:

int idx = blockIdx.x * blockDim.x + threadIdx.x;
outArr[idx] = a[idx] + b[idx];

Each thread inside the grid will do two reads (on the right) from two different locations and one write to another location. All in global memory. You can let all threads read/write from the same location in global memory. However, to prevent race condition, you need to use atomic functions.

Read/write from/to global memory can be slow (it's DRAM), especially if threads do not read from coalesed memory (i.e: if thread 0, 1, 2, 3 reads from 0x0,0x4,0x8,0xc then it's coalesed). To understand more about the CUDA memory model, you can read section 2.4 in the CUDA Programming Guide.

Hope that helps!

Threads interaction with global memory

1 Answers1