4

How can I know the amount of shared memory available on my GPU?

I'm interested in how big arrays I can store in my shared memory. My GPU is a Nvidia GeForce 650 Ti. I am using VS2013 with the CUDA toolkit for coding.

I would really appreciate if someone could explain, how I can figure it out by myself, not only give a raw number.

paleonix
  • 2,293
  • 1
  • 13
  • 29
Mikhail Genkin
  • 123
  • 1
  • 6

2 Answers2

13

Two ways:

  1. read the documentation (programming guide). Your GeForce 650 Ti is cc3.0 GPU. (If you want to learn how to discover that, there is documentation or read item 2).

    For a cc3.0 GPU, it is a maximum of 48KB per threadblock.

  2. Programmatically, by running cudaGetDeviceProperties (documentation). The cuda sample app deviceQuery demonstrates this.

EDIT: responding to the question below.

The 48KB limit per threadblock is a logical limit as seen from the perspective of kernel code. There are at least two other numbers:

  1. Total amount of shared memory per SM (this is also listed in the documentation (same as above) and available via cudaGetDeviceProperties (same as above).) For a cc3.0 GPU this is again 48KB. This will be one limit to occupancy; this particular limit being the total available per SM divided by the amount used by a threadblock. If your threadblock uses 40KB of shared memory, you can have at most 1 threadblock resident per SM, at a time, on a cc3.0 GPU. If your threadblock uses 20KB of shared memory, you could possibly have 2 threadblocks resident per SM, ignoring other limits to occupancy.

  2. Total amount per device/GPU. I consider this to be a less relevant/useful number. It is equal to the total number of SMs on your GPU multiplied by the total amount per SM. This number is not particularly meaningful, i.e. it does not communicate new information beyond the knowledge of the number of SMs on your GPU. I can't really think of a use for this number, at the moment.

SM as used above means "streaming multiprocessor" which is identified here. It is also just referred to as "multiprocessor", for example in the table 12 I linked above.

Various newer GPUs have the ability to exceed the 48KB per threadblock limit. See here for example.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • 48KB per threadblock. But what is the total memory available? This important, because this value determines, how many blocks are running in parallel. If there will be not enough total shared memory, some blocks execution will have to wait until other blocks complete job and free memory – Mikhail Genkin Dec 17 '14 at 19:23
  • 1
    The answer to this is contained in the same documentation link and same programmatic method that I had already pointed out. But I have edited my answer. – Robert Crovella Dec 17 '14 at 19:44
  • 2
    What is SM? Please expand this abbreviation – Mikhail Genkin Dec 17 '14 at 19:50
  • 2
    Streaming Multiprocessor. This concept is explained in the CUDA Programming Guide. – user703016 Dec 17 '14 at 20:08
  • Linking CuPy API reference, [`cudaGetDeviceProperties`](https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.runtime.getDeviceProperties.html), for those who want to check their device's memory via python shell/script. – Greg Kramida Jun 16 '23 at 18:00
0

If you have PGI compiler installed, just do pgiaccelinfo, then you don't have to read the documentation.

paleonix
  • 2,293
  • 1
  • 13
  • 29
JimBamFeng
  • 709
  • 1
  • 4
  • 20