29

I am looking for a function that count number of core of my cuda device. I know each microprocessor have specific cores, and my cuda device has 2 microprocessors.

I searched a lot to find a property function that count number of cores per microprocessor but I couldn't. I use the code below but I still need number of cores?

  • cuda 7.0
  • program language C
  • visual studio 2013

Code:

void printDevProp(cudaDeviceProp devProp)
{   printf("%s\n", devProp.name);
printf("Major revision number:         %d\n", devProp.major);
printf("Minor revision number:         %d\n", devProp.minor);
printf("Total global memory:           %u", devProp.totalGlobalMem);
printf(" bytes\n");
printf("Number of multiprocessors:     %d\n", devProp.multiProcessorCount);
printf("Total amount of shared memory per block: %u\n",devProp.sharedMemPerBlock);
printf("Total registers per block:     %d\n", devProp.regsPerBlock);
printf("Warp size:                     %d\n", devProp.warpSize);
printf("Maximum memory pitch:          %u\n", devProp.memPitch);
printf("Total amount of constant memory:         %u\n",   devProp.totalConstMem);
return;
}
Alsphere
  • 513
  • 1
  • 7
  • 22
  • I found a links for cuda 5.0 + visual studio 2012 with sample projects [cuda example](http://code.msdn.microsoft.com/windowsdesktop/CUDA-50-and-Visual-Studio-20e71aa1), and a link for cuda 7.0 + visual studio [cuda 7.0 getting started](http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-microsoft-windows/#axzz3lScBd2Bm) . – rcgldr Sep 11 '15 at 19:33

4 Answers4

28

The cores per multiprocessor is the only "missing" piece of data. That data is not provided directly in the cudaDeviceProp structure, but it can be inferred based on published data and more published data from the devProp.major and devProp.minor entries, which together make up the CUDA compute capability of the device.

Something like this should work:

#include "cuda_runtime_api.h"
// you must first call the cudaGetDeviceProperties() function, then pass 
// the devProp structure returned to this function:
int getSPcores(cudaDeviceProp devProp)
{  
    int cores = 0;
    int mp = devProp.multiProcessorCount;
    switch (devProp.major){
     case 2: // Fermi
      if (devProp.minor == 1) cores = mp * 48;
      else cores = mp * 32;
      break;
     case 3: // Kepler
      cores = mp * 192;
      break;
     case 5: // Maxwell
      cores = mp * 128;
      break;
     case 6: // Pascal
      if ((devProp.minor == 1) || (devProp.minor == 2)) cores = mp * 128;
      else if (devProp.minor == 0) cores = mp * 64;
      else printf("Unknown device type\n");
      break;
     case 7: // Volta and Turing
      if ((devProp.minor == 0) || (devProp.minor == 5)) cores = mp * 64;
      else printf("Unknown device type\n");
      break;
     case 8: // Ampere
      if (devProp.minor == 0) cores = mp * 64;
      else if (devProp.minor == 6) cores = mp * 128;
      else if (devProp.minor == 9) cores = mp * 128; // ada lovelace
      else printf("Unknown device type\n");
      break;
     case 9: // Hopper
      if (devProp.minor == 0) cores = mp * 128;
      else printf("Unknown device type\n");
      break;
     default:
      printf("Unknown device type\n"); 
      break;
      }
    return cores;
}

(coded in browser)

"cores" is a bit of a marketing term. The most common connotation in my opinion is to equate it with SP units in the SM. That is the meaning I have demonstrated here. I've also omitted cc 1.x devices from this, as those device types are no longer supported in CUDA 7.0 and CUDA 7.5

A pythonic version is here

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • my device is GeForce GT 740M and it has 384 CUDA Cores but what is the right query function to print cuda cores with other properties above? – Alsphere Sep 12 '15 at 01:45
  • There is no CUDA cores property. You have to use the method I described to compute it. – Robert Crovella Sep 12 '15 at 01:50
  • why you multiply mp with 192, 32 , 128? what does 192 means – Alsphere Sep 12 '15 at 17:31
  • 4
    The documentation I linked by the words "published data" in my answer (click on it!) explains these numbers. It's not all in a single sentence, so you'll need to read several sections/paragraphs. The 32 and 48 numbers are described [here](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability-2-x), the 192 number is indicated [here](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability-3-0) and the 128 number is indicated [here](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability-5-x). – Robert Crovella Sep 13 '15 at 21:15
  • Why is the device prop structure passed by value? – MikeB Nov 04 '22 at 23:27
  • You could pass it by reference or some other method as well. I don't really understand the question or concern. – Robert Crovella Nov 04 '22 at 23:36
20

In linux you can run the following command to get the number of CUDA cores:

nvidia-settings -q CUDACores -t

To get the output of this command in C, use the popen function.

Bas Van Assche
  • 321
  • 2
  • 3
4

As Vraj Pandya already said, there is a function (_ConvertSMVer2Cores) in the Common/helper_cuda.h file on nvidia's cuda-samples github repository, which provides this functionality. You just need to multiply its result with the multiprocessor count from the GPU.

Just wanted to provide a current link.

#include <cuda.h>
#include <cuda_runtime.h>
#include <helper_cuda.h> // You need to place this file somewhere where it can be
                         // found by the linker. 
                         // The file itself seems to also require the 
                         // `helper_string.h` file (in the same folder as 
                         // `helper_cuda.h`).

int deviceID;
cudaDeviceProp props;

cudaGetDevice(&deviceID);
cudaGetDeviceProperties(&props, deviceID);
    
int CUDACores = _ConvertSMVer2Cores(props.major, props.minor) * props.multiProcessorCount;
caarrrl
  • 111
  • 3
  • 4
0

Maybe this might help a bit more.

https://devtalk.nvidia.com/default/topic/470848/cuda-programming-and-performance/what-39-s-the-proper-way-to-detect-sp-cuda-cores-count-per-sm-/post/4414371/#4414371

"there is a library helper_cuda.h which contains a routine _ConvertSMVer2Cores(int major, int minor) which takes the compute capability level of the GPU and returns the number of cores (stream processors) in each SM or SMX" -from the post.

Vraj Pandya
  • 591
  • 1
  • 9
  • 13