I am working with Cloo, an OpenCL C# library, and I was wondering how I can best determine which device to use for my kernels at runtime. What I really want to know is how many cores I have (compute units * cores per compute unit) on GPUs. How do I do this properly? I currently can determine compute units and frequency.
EDIT: I have considered trying to profile (run a speed test) on all devices and save/compare the results. But, from my understanding this poses a problem as well because you can't write a program that optimally/fairly uses all devices for comparison.
This would also be useful to choose an optimal number of worker threads to specify for every kernel call. Any help is greatly appreciated.