0

My GPU GeForce GTX 1050 Ti has compute capability 6.1. According to the CUDA docs it has 96 KB of shared memory per streaming multiprocessor.

How to get this limit from the program?

I tried to call

cudaDeviceGetAttribute(&value, cudaDevAttrMaxSharedMemoryPerBlock, device);

and it returned the value 49152 = 48 KB, which is two times smaller than what I am expecting. Why this is so and how to get the actual maximum?

  • 1
    You are not comparing the same thing. Shared memory per block and shared memory per multiprocessor are different – talonmies Dec 15 '20 at 09:54
  • @talonmies I didn't know that. How to get shared memory per MP then? –  Dec 15 '20 at 09:55
  • 1
    Use cudaDevAttrMaxSharedMemoryPerMultiprocessor not cudaDevAttrMaxSharedMemoryPerBlock – talonmies Dec 15 '20 at 09:57
  • 1
    @talonmies Thanks, this helps! But now I am wondering why such a difference even exists? Why I can't use the whole shared memory of MP inside the single block? –  Dec 15 '20 at 10:02

0 Answers0