2

I'm writing a program where I make a kernel call inside a huge for loop. The kernel mostly uses input data stored in some arrays that I have in global memory. Each thread accesses its own data within those arrays. The data in those arrays doesn't changed during the whole program execution.

Right now, each thread fetches its own data from the global memory and store it in its own registers, so basically it's only one access to global memory (per kernel call). I see no gain using shared memory since every thread only uses its own data.

Since the data doesn't change during the whole program execution, I was thinking about using constant memory but I read that all threads need to access the same data to gain performance (it's not my case).

I was also reading about texture memory, but it seems it's not what I'm looking for. Texture memory offers cached access but right now, after I read from the global memory, all subsequent accesses are from the registers (within a kernel call).

So, in each kernel call, threads read over and over again from the global memory and save it to the registers.

Is there any fast and persistent memory where I can stored those arrays along the whole program execution? I'm trying to avoid accessing global memory each time.

0 Answers0