Which of the "+" calculation is faster? 1) uint2 a, b, c; c = a + b; 2) ulong a, b, c; c = a + b;
-
1It's not possible to answer this question for OpenCL itself. OpenCL is just an API for the underlying hardware behavior, and the underlying hardware could be anything, from a Graphics Card, to a Compute Card, to a CPU, and any one of those devices could have different performance profiles from any other of those devices. If you could name a specific device that you plan to use with your OpenCL application, there's a better chance of getting a concrete answer, but you'd still be at the mercy of the availability of public knowledge of the Integer performance of said device. – Xirema Aug 21 '18 at 21:16
-
@Xirema Thanks. Let me modify it about AMD GCN cards – user1200759 Aug 21 '18 at 21:52
-
The answer is to profile it on the GPU that you care about. – Dithermaster Aug 22 '18 at 23:11
1 Answers
AMD GCN has no native 64-bit integer vector support, so the second statement would be translated into two 32-bit adds, one V_ADD_U32 followed by a V_ADDC_U32 which takes the carry flag from the first V_ADD_U32 into account.
So to answer your question they are both the same in terms of instruction count, however the first can be computed in parallel (instruction level parallelism) and could be faster IF your kernel is occupancy bound (ie. using lots of registers).
If your statements can be executed by the scalar unit (ie. they do not depend on the thread index) then the game changes and the second one will be just one instruction (vs. two) since the scalar unit has native 64-bit integer support.
However keep in mind your first statement is not the same as the second, you would lose the carry flag.

- 723
- 5
- 13