Questions tagged [bank-conflict]

a latency problem due to multi-threaded access to a shared memory system. At present, this latency issue is most common in nVidia and ATI graphics cards.

nVidia and ATI graphics cards with shared memory experience bank-conflicts when multiple threads attempt to access (out-of-order) a common bank of memory. For more information, please see the Stack Overflow question:

What is a bank conflict? (Doing Cuda/OpenCL programming)

41 questions
114
votes
5 answers

What is a bank conflict? (Doing Cuda/OpenCL programming)

I have been reading the programming guide for CUDA and OpenCL, and I cannot figure out what a bank conflict is. They just sort of dive into how to solve the problem without elaborating on the subject itself. Can anybody help me understand it? I have…
smuggledPancakes
  • 9,881
  • 20
  • 74
  • 113
19
votes
4 answers

Why aren't there bank conflicts in global memory for Cuda/OpenCL?

One thing I haven't figured out and google isn't helping me, is why is it possible to have bank conflicts with shared memory, but not in global memory? Can there be bank conflicts with registers? UPDATE Wow I really appreciate the two answers from…
smuggledPancakes
  • 9,881
  • 20
  • 74
  • 113
12
votes
2 answers

GPU Shared Memory Bank Conflict

I am trying to understand how bank conflicts take place. I have an array of size 256 in global memory and I have 256 threads in a single block, and I want to copy the array to shared memory. Therefore every thread copies one…
scatman
  • 14,109
  • 22
  • 70
  • 93
10
votes
1 answer

Do bank conflicts occur on non-GPU hardware?

This blog post explains how memory bank conflicts kill the transpose function's performance. Now I can't but wonder: does the same happen on a "normal" cpu (in a multithreaded context)? Or is this specific to CUDA/OpenCL? Or does it not even appear…
rubenvb
  • 74,642
  • 33
  • 187
  • 332
8
votes
1 answer

Can using kernel parameters cause bank conflicts?

The kernel parameters are stored in on-chip shared memory. Shared memory can have bank conflicts if threads try to access the same bank. So my question is: does that mean that using kernel parameters threads will cause bank conflicts?
Netuimeni
  • 93
  • 4
7
votes
1 answer

CUDA - determine number of banks in shared memory

Shared memory is "striped" into banks. This leads to the whole issue of bank conflicts, as we all know. Question: But how can you determine how many banks ("stripes") exist in shared memory? (Poking around NVIDIA "devtalk" forums, it seems that…
cmo
  • 3,762
  • 4
  • 36
  • 64
7
votes
3 answers

Expected number of bank conflicts in shared memory at random access

Let A be a properly aligned array of 32-bit integers in shared memory. If a single warp tries to fetch elements of A at random, what is the expected number of bank conflicts? In other words: __shared__ int A[N]; //N is some big constant…
CygnusX1
  • 20,968
  • 5
  • 65
  • 109
5
votes
1 answer

CUDA: bank conflicts between different warps?

I just learned (from Why only one of the warps is executed by a SM in cuda?) that Kepler GPUs can actually execute instructions from several (apparently 4) warps at once. Can a shared memory bank also serve four requests at once? If not, that would…
3
votes
1 answer

How to measure bank conflicts per warp using NVIDIA Visual Profiler?

I am doing a detailed code analysis for which I want to measure the total number of bank conflicts per warp. The nvvp documentation lists this metric, which was the only one I could find related to bank conflicts: shared_replay_overhead: Average…
Kajal
  • 581
  • 11
  • 24
3
votes
3 answers

Coalescence vs Bank conflicts (Cuda)

What is the difference between coalescence and bank conflicts when programming with cuda? Is it only that coalescence happens in global memory while bank conflicts in shared memory? Should I worry about coalescence, if I have a >1.2 supported GPU?…
hero
  • 31
  • 1
  • 2
3
votes
1 answer

purposely causing bank conflicts for shared memory on CUDA device

It is a mystery for me how shared memory on CUDA devices work. I was curious to count threads having access to the same shared memory. For this I wrote a simple program #include #include #define nblc 13 #define nthr…
yarchik
  • 336
  • 1
  • 8
3
votes
1 answer

Bank conflict CUDA shared memory?

I'm running into (what I believe are) shared-memory bank conflicts in a CUDA kernel. The code itself is fairly complex, but I reproduced it in the simple example attached below. In this case it is simplified to a simple copy from global -> shared…
Bart
  • 9,825
  • 5
  • 47
  • 73
3
votes
1 answer

Relevance of shared memory bank conflicts in Fermi and higher

From what I read in the CUDA documentation, shared memory bank conflicts are irrelevant on sm_20 and higher because values are broadcasted when they are requested simultaneously, preventing any sort of serialization delays. The documentation: The…
2
votes
1 answer

Memory padding vs coalesced access

I have a little confusion about bank conflicts, avoiding them using memory padding and coalesced memory access. What I've read so far: Coalesced memory access from global memory is optimal. If it isn't achievable shared memory might be used to…
SimonH
  • 1,385
  • 15
  • 35
2
votes
1 answer

CUDA memory bank conflict

I would like to be sure that I correctly understand bank conflicts in shared memory. I have 32 segments of data. These segments consist of 128 integers each. [[0, 1, ..., 126, 127], [128, 129, ..., 255], ..., [3968, 3969, ..., 4095]] Each thread in…
Piotr K.
  • 93
  • 1
  • 5
1
2 3