Questions tagged [bank-conflict]

a latency problem due to multi-threaded access to a shared memory system. At present, this latency issue is most common in nVidia and ATI graphics cards.

nVidia and ATI graphics cards with shared memory experience bank-conflicts when multiple threads attempt to access (out-of-order) a common bank of memory. For more information, please see the Stack Overflow question:

What is a bank conflict? (Doing Cuda/OpenCL programming)

41 questions

114

votes

5 answers

What is a bank conflict? (Doing Cuda/OpenCL programming)

I have been reading the programming guide for CUDA and OpenCL, and I cannot figure out what a bank conflict is. They just sort of dive into how to solve the problem without elaborating on the subject itself. Can anybody help me understand it? I have…

asked Oct 01 '10 at 18:04

smuggledPancakes

9,881
20
74
113

votes

4 answers

Why aren't there bank conflicts in global memory for Cuda/OpenCL?

One thing I haven't figured out and google isn't helping me, is why is it possible to have bank conflicts with shared memory, but not in global memory? Can there be bank conflicts with registers? UPDATE Wow I really appreciate the two answers from…

cuda opencl nvidia bank-conflict

asked Oct 01 '10 at 21:02

smuggledPancakes

9,881
20
74
113

votes

2 answers

GPU Shared Memory Bank Conflict

I am trying to understand how bank conflicts take place. I have an array of size 256 in global memory and I have 256 threads in a single block, and I want to copy the array to shared memory. Therefore every thread copies one…

c++ cuda gpgpu gpu-shared-memory bank-conflict

asked Dec 09 '10 at 08:22

scatman

14,109
22
70
93

votes

1 answer

Do bank conflicts occur on non-GPU hardware?

This blog post explains how memory bank conflicts kill the transpose function's performance. Now I can't but wonder: does the same happen on a "normal" cpu (in a multithreaded context)? Or is this specific to CUDA/OpenCL? Or does it not even appear…

c opencl cpu-cache bank-conflict

asked Jun 19 '14 at 14:09

rubenvb

74,642
33
187
332

votes

1 answer

Can using kernel parameters cause bank conflicts?

The kernel parameters are stored in on-chip shared memory. Shared memory can have bank conflicts if threads try to access the same bank. So my question is: does that mean that using kernel parameters threads will cause bank conflicts?

cuda bank-conflict

asked Oct 11 '12 at 12:31

Netuimeni

votes

1 answer

CUDA - determine number of banks in shared memory

Shared memory is "striped" into banks. This leads to the whole issue of bank conflicts, as we all know. Question: But how can you determine how many banks ("stripes") exist in shared memory? (Poking around NVIDIA "devtalk" forums, it seems that…

c++ cuda gpu gpu-shared-memory bank-conflict

asked Jun 10 '13 at 15:14

cmo

3,762
4
36
64

votes

3 answers

Expected number of bank conflicts in shared memory at random access

Let A be a properly aligned array of 32-bit integers in shared memory. If a single warp tries to fetch elements of A at random, what is the expected number of bank conflicts? In other words: __shared__ int A[N]; //N is some big constant…

cuda gpu-shared-memory bank-conflict

asked Oct 10 '12 at 15:58

CygnusX1

20,968
5
65
109

votes

1 answer

CUDA: bank conflicts between different warps?

I just learned (from Why only one of the warps is executed by a SM in cuda?) that Kepler GPUs can actually execute instructions from several (apparently 4) warps at once. Can a shared memory bank also serve four requests at once? If not, that would…

cuda gpu-shared-memory bank-conflict

asked Feb 15 '14 at 19:22

user3314215

votes

1 answer

How to measure bank conflicts per warp using NVIDIA Visual Profiler?

I am doing a detailed code analysis for which I want to measure the total number of bank conflicts per warp. The nvvp documentation lists this metric, which was the only one I could find related to bank conflicts: shared_replay_overhead: Average…

cuda gpu nvidia bank-conflict nvvp

asked Jun 06 '16 at 05:39

Kajal

votes

3 answers

Coalescence vs Bank conflicts (Cuda)

What is the difference between coalescence and bank conflicts when programming with cuda? Is it only that coalescence happens in global memory while bank conflicts in shared memory? Should I worry about coalescence, if I have a >1.2 supported GPU?…

cuda bank-conflict

asked Aug 19 '10 at 05:20

hero

votes

1 answer

purposely causing bank conflicts for shared memory on CUDA device

It is a mystery for me how shared memory on CUDA devices work. I was curious to count threads having access to the same shared memory. For this I wrote a simple program #include #include #define nblc 13 #define nthr…

cuda gpu gpu-shared-memory bank-conflict

asked May 29 '15 at 17:07

yarchik

votes

1 answer

Bank conflict CUDA shared memory?

I'm running into (what I believe are) shared-memory bank conflicts in a CUDA kernel. The code itself is fairly complex, but I reproduced it in the simple example attached below. In this case it is simplified to a simple copy from global -> shared…

cuda gpu-shared-memory bank-conflict

asked Feb 06 '15 at 21:05

Bart

9,825
5
47
73

votes

1 answer

Relevance of shared memory bank conflicts in Fermi and higher

From what I read in the CUDA documentation, shared memory bank conflicts are irrelevant on sm_20 and higher because values are broadcasted when they are requested simultaneously, preventing any sort of serialization delays. The documentation: The…

cuda gpu-shared-memory bank-conflict

asked Jul 03 '14 at 15:04

user3800357

votes

1 answer

Memory padding vs coalesced access

I have a little confusion about bank conflicts, avoiding them using memory padding and coalesced memory access. What I've read so far: Coalesced memory access from global memory is optimal. If it isn't achievable shared memory might be used to…

cuda memory-access bank-conflict

asked Jan 19 '22 at 15:35

SimonH

1,385
15
35

votes

1 answer

CUDA memory bank conflict

I would like to be sure that I correctly understand bank conflicts in shared memory. I have 32 segments of data. These segments consist of 128 integers each. [[0, 1, ..., 126, 127], [128, 129, ..., 255], ..., [3968, 3969, ..., 4095]] Each thread in…

cuda gpu-shared-memory bank-conflict

asked Sep 11 '21 at 11:10

Piotr K.

2 3 Next