0

While processing a vector with 1,000,000 elements I tried printing the global ID every 10,000 iterations to monitor progress in development by adding these lines to the kernel:

"#pragma OPENCL EXTENSION cl_amd_printf : enable                                \n" \

and

"    if(id % 10000 == 0){                                                       \n" \
"        printf(\"%d\\r\\n\", id);                                              \n" \
"    }                                                                          \n" \

That resulted in normal 3.0-3.3 second execution bloated into 38-40 seconds. As I could not find any mention of performance in the section A.8.10 of AMD OpenCL 3.0 SDK, it is not immediately clear if this behavior is normal.

Is this performance hit normal and expected, or am I doing anything wrong?

Faqit
  • 11
  • 1
  • 3
  • 1
    By adding `printf()` you are making a fully parallel CL code to be seriallized in order to write in order to the `std::cout` stream. It is obiusly going to be much slower, and the sole purpose of it is debugging. So, yes, it is expected, and normal. – DarkZeros May 23 '16 at 15:47
  • 2
    Adding `printf` never gives you better performance in any case, it makes it only worse :) – Elalfer May 29 '16 at 16:32

0 Answers0