2

Is it possible to query the number of processing elements (per compute unit) in OpenCL? If yes, how? I did not find a corresponding parameter on the clGetDeviceInfo doc page.

I am not sure if processing element is standard terminology. I got the term from this video.

I'd like to query this information because I am curious, not for a practical purpose.

Szabolcs
  • 24,728
  • 9
  • 85
  • 174
  • Benchmark different maximum number of threads then find the one which gets maximum occupation/issueing. For example, 1280 core GPU will be happy with 1280(or 256 x 5) threads and multiple of that. But it doesnt like 1024 threads, 2048 threads, 4096 threads because there is always empty/waiting-idle cores in this setting. Also dont forget to set thread group size to 256(smaller will just increase occupation and make benchmark obsolete). Bench for multiple of 256. If 2048 and 2604 gives same result, then decrease the thread group size to 128 and test that narrower range of 2048-2604. – huseyin tugrul buyukisik Jan 16 '14 at 19:16
  • @huseyintugrulbuyukisik I'm a complete beginner with OpenCL. I take it your comment means that there's no direct and simple way to query this through standard OpenCL, right? – Szabolcs Jan 16 '14 at 19:30
  • You need a "closer to hardware" API, maybe even need to use drivers of gpu just as GPU-z(thats not a database hard-coded thing I wish). I searched for the same thing when I first started opencl but couldnt find. Even in compubench.com cannot list their core numbers. For example a HD7750 has same number of cores per compute units with a HD7970(all 64 for GCN). Nvidia cards usually has 192 for 600 series . Intel HD uses 4 and 8-wide. versions. Fiddle with drivers to query core number or use database or get compute unit number and multiply it with 192/64. – huseyin tugrul buyukisik Jan 16 '14 at 20:59
  • @huseyintugrulbuyukisik "Not possible" is an acceptable answer too. You should consider posting it. – Szabolcs Jan 16 '14 at 21:02

1 Answers1

3

Processing element (PE) is the standard terminology and no you cannot query the number.

Now I see some reasons why it's not possible:

  1. The definition itself:

    PE: A virtual scalar processor. A work-item may execute on one or more processing elements.

    So depending on the architecture the number that would be returned would be more or less meaningless. I think for instance to the previous architecture of AMD GPUs which used VLIW processors.

  2. PE is an abstraction that is most useful in the standard to illustrate/define some concepts see for instance the definitions given to SIMD, SPMD and of course the Platform Model. But this concept is not used in practice (though very useful to know by the developer to achieve good performance). You will care instead about the max number of work-items in a work-group.

  3. Even within an given architecture the processing elements are of different types. For example if we take the GK110 Kepler Architecture an SMx (the equivalent of the Compute Unit) has 192 SP CUDA cores, 64 DP units, 32 special function units (SFU). So what should be the returned number of a query asking for the number of PE?

CaptainObvious
  • 2,525
  • 20
  • 26