17

Theoretically, you can have 65535 blocks per dimension of the grid, up to 65535 * 65535 * 65535.

If you call a kernel like this:

kernel<<< BLOCKS,THREADS >>>()

(without dim3 objects), what is the maximum number available for BLOCKS?

In an application of mine, I've set it up to 192000 and seemed to work fine... The problem is that the kernel I used changes the contents of a huge array, so although I checked some parts of the array and seemed fine, I can't be sure whether the kernel behaved strangely at other parts.

For the record I have a 2.1 GPU, GTX 500 ti.

Super Kai - Kazuya Ito
  • 22,221
  • 10
  • 124
  • 129
STE
  • 656
  • 3
  • 8
  • 18

4 Answers4

17

With compute capability 3.0 or higher, you can have up to 2^31 - 1 blocks in the x-dimension, and at most 65535 blocks in the y and z dimensions. See Table H.1. Feature Support per Compute Capability of the CUDA C Programming Guide Version 9.1.

As Pavan pointed out, if you do not provide a dim3 for grid configuration, you will only use the x-dimension, hence the per dimension limit applies here.

perreal
  • 94,503
  • 21
  • 155
  • 181
  • 3
    Probably nice to mention that BLOCKS without dim3 only specifies the number of blocks you want along x dimension and hence the limit is still 65535 – Pavan Yalamanchili Mar 23 '12 at 15:38
  • I know you can use up to that many blocks per dimension, I wrote it to my question as well. The thing is, when you call a kernel the way I showed, what is the maximum number of blocks? is it 65535 or 65535^3? – STE Mar 23 '12 at 15:41
  • Alright, Pavan thanks! Do we know, what happens if you put more blocks than available? Because it seems to work just fine – STE Mar 23 '12 at 15:42
  • 1
    @STE are you catching the errors ? cudaGetErrorString(cudaGetLastError()) after launching the kernel will return "Invalid launch configuration" or something of that sort – Pavan Yalamanchili Mar 23 '12 at 16:49
  • IIANM, that's 2^31 - 1, not 2^32 - 1, i.e. an `int` would do. – einpoklum Jan 22 '20 at 21:03
  • @einpoklum, according to https://docs.nvidia.com/cuda/cuda-c-programming-guide/#dim3 these are unsigned. – perreal Jan 22 '20 at 21:12
  • 1
    @perreal: Indeed, they are unsigned; what I meant was that an `int` fits the maximum possible value. – einpoklum Jan 22 '20 at 21:31
14

In case anybody lands here based on a Google search (as I just did):

Nvidia changed the specification since this question was asked. With compute capability 3.0 and newer, the x-Dimension of a grid of thread blocks is allowed to be up to 2'147'483'647 or 2^31 - 1.

See the current: Technical Specification

Marco
  • 1,080
  • 1
  • 12
  • 13
2

65535 in a single dimension. Here's the complete table

jwdmsd
  • 2,107
  • 2
  • 16
  • 30
-2

I manually checked on my laptop (MX130), program crashes when #blocks > 678*1024+651. Each block with 1 thread, Adding even a single more block gives SegFault. Kernal code had no grid, linear structure only.

Varun
  • 19
  • 10