Highest Voted 'warp-scheduler' Questions

176

votes

2 answers

How do CUDA blocks/warps/threads map onto CUDA cores?

I have been using CUDA for a few weeks, but I have some doubts about the allocation of blocks/warps/thread. I am studying the architecture from a didactic point of view (university project), so reaching peak performance is not my concern. First of…

asked May 05 '12 at 09:58

Daedalus

1,761
3
11
3

7

votes

1 answer

Why there are two warp schedulers in a SM of GPU?

I read NVIDIA Fermi whitepaper and get confused when I calculated the number of SP cores, schedulers. According to the whitepaper, in each SM, there are two warp schedulers and two instruction dispatch units, allowing two warps to be issued and…

cuda warp-scheduler

asked May 04 '16 at 14:19

Dongwei Wang

475
5
14

5

votes

2 answers

blocks, threads, warpSize

There has been much discussion about how to choose the #blocks & blockSize, but I still missing something. Many of my concerns address this question: How CUDA Blocks/Warps/Threads map onto CUDA Cores? (To simplify the discussion, there is enough…

cuda warp-scheduler

asked Jun 07 '12 at 14:51

Doug

2,783
6
33
37

4

votes

1 answer

CUDA Warps and Thread Divergence

I am trying to understand CUDA warps and thread divergence. Suppose I have a naive matrix multiplication kernel to multiply n x n matrices. __global__ void matrix_multiply(float* a, float* b, float* c, int n) { int row = blockIdx.y + blockDim.y…

cuda warp-scheduler

asked Oct 15 '14 at 01:14

csnate

1,601
4
19
31

3

votes

1 answer

cuda: warp divergence overhead vs extra arithmetic

Of course, warp divergence, via if and switch statements, is to be avoided at all costs on GPUs. But what is the overhead of warp divergence (scheduling only some of the threads to execute certain lines) vs. additional useless arithmetic? Consider…

cuda gpu warp-scheduler

asked May 24 '13 at 16:07

cmo

3,762
4
36
64

1

vote

1 answer

In NVIDIA gpu， Can ld/st and arithmetic instruction（such as int32 fp32 ）run simultaneously in same sm?

Especially turing and ampere architecture，In the same sm and same warp scheduler，Can the warps run ld/st and other arithmetic instruction simultaneously? I want to know about how warp scheduler work

parallel-processing gpu nvidia gpgpu warp-scheduler

asked Jan 17 '23 at 08:47

sorfkc

13
4

1

vote

1 answer

Questions of resident warps of CUDA

I have been using CUDA for a month, now i'm trying to make it clear that how many warps/blocks are needed to hide the latency of memory accesses. I think it is related to the maximum of resident warps on a multiprocessor. According to Table.13 in…

cuda gpu gpu-warp warp-scheduler

asked Jan 12 '17 at 07:53

Falofter

41
3

1

vote

1 answer

cuda shared memory and block execution scheduling

I would like to clear up an execution state with CUDA shared memory and block execution based on the amount of shared memory used per block. State I target on GTX480 nvidia card which has 48KB shared memory per block and 15 streaming…

cuda gpu-shared-memory warp-scheduler

asked Sep 29 '12 at 10:28

zeus2

309
2
11

0

votes

1 answer

How to a warp cause another warp be in the Idle state?

As you can see in the title of the question, I want to know how a warp causes another warp go to the Idle state. I read a lot of the Q/A in the SO but I can not find the answer. At any time, just one warp in a block can be run? If so, the idle state…

cuda gpu block warp-scheduler

asked Feb 27 '18 at 09:19

Saeed Rahmani

650
1
8
29

0

votes

1 answer

Is there a way to explicitly map a thread to a specific warp in CUDA?

Say, dynamic analysis was done on a CUDA program such that certain threads were better off being in the same warp. For example, let's pretend we have 1024 cuda threads and a warp size of 32. After dynamic analysis we find out that threads 989, 243,…

cuda gpu gpgpu gpu-warp warp-scheduler

asked Mar 24 '17 at 22:11

xfern

96
1
7

0

votes

1 answer

CUDA Kepler: not enough ALUs

According to the Kepler whitepage, the warp size for a Kepler based GPU is 32 and each multiprocessor contains 4 warp schedulars which select two independant instructions from a chosen warp. This means that each clock cycle, 32*4*2 = 256…

cuda kepler warp-scheduler

asked May 28 '14 at 16:06

PieterV

816
10
23

0

votes

1 answer

What is the instruction issue time latency of the warp schedulers in CUDA?

I am under the impression that the (single) warp scheduler in compute capability 1.x GPUs issues one instruction per warp every 4 cycles, and since the latency of the arithmetic pipeline is 24 cycles, it can be completely hidden by having 6 active…

optimization cuda warp-scheduler

asked Aug 26 '13 at 13:36

charis

429
6
16

Questions tagged [warp-scheduler]