Measuring Time inside kernel on intel iGPU

Question

I am very new to OpenCL, however I have fair amount of experience on GPU programming using CUDA. I used to use clock function inside CUDA kernel (as mentioned in here) to measure ticks of certain operations inside the kernel. I wrote a simple OpenCL vector addition kernel and tried to run it on the intel integrated GPU. The program ran fine and gave correct output. But then I tried to use the clock function inside the kernel function and there is JIT compilation error while executing the clBuildProgram. The vector addition kernel that I wanted to execute is provided below:

__kernel void testVecAdd(__global const int *a,__global const int *b,__global int *c,
                 __global float *t){

clock_t start = clock();

 int gid = get_global_id(0);
 c[gid] = a[gid] + b[gid];

 t[gid] = (float)(clock()-start)/CLOCKS_PER_SEC;

}

The errors are as follows:

/home/duttasankha/Desktop/SANKHA_ALL/IGPU_RESEARCH_RELATED/OCL_PRAC_DIR/test_OCL_1.cl:6:2: error: use of undeclared identifier 'clock_t'
 clock_t start = clock();
 ^
/home/duttasankha/Desktop/SANKHA_ALL/IGPU_RESEARCH_RELATED/OCL_PRAC_DIR/test_OCL_1.cl:11:19: error: implicit declaration of function 'clock' is invalid in OpenCL
 t[gid] = (float)(clock()-start)/CLOCKS_PER_SEC;
                  ^
/home/duttasankha/Desktop/SANKHA_ALL/IGPU_RESEARCH_RELATED/OCL_PRAC_DIR/test_OCL_1.cl:11:27: error: use of undeclared identifier 'start'; did you mean 'sqrt'?
 t[gid] = (float)(clock()-start)/CLOCKS_PER_SEC;
                          ^~~~~
                          sqrt
CTHeader.h:5277:40: note: 'sqrt' declared here
double16 __attribute__((overloadable)) sqrt(double16);
                                       ^
/home/duttasankha/Desktop/SANKHA_ALL/IGPU_RESEARCH_RELATED/OCL_PRAC_DIR/test_OCL_1.cl:11:27: error: taking address of function is not allowed
 t[gid] = (float)(clock()-start)/CLOCKS_PER_SEC;
                          ^
/home/duttasankha/Desktop/SANKHA_ALL/IGPU_RESEARCH_RELATED/OCL_PRAC_DIR/test_OCL_1.cl:11:34: error: use of undeclared identifier 'CLOCKS_PER_SEC'
 t[gid] = (float)(clock()-start)/CLOCKS_PER_SEC;
                                 ^

Failed to build program...: -11 (CL_BUILD_PROGRAM_FAILURE)
Build failed!

I was able to do this in the CUDA as it supports clock function. But similar goals was not achieved with the intel iGPU. I also tried other functions to measure the ticks but none of them worked as well. I also tried offline compilation using ioc64 but I got same errors. I was just wondering if someone could tell me is there anything wrong I am doing in here or getting the ticks using clock (or similar) functions is not possible in the intel integrated GPU. It is absolutely necessary for me to get this execution traces. So if using clock function is not a viable option then I was wondering what would be the alternate option in here to achieve same goals and how can I use it? Thank you.

What lead you to believe that `clock()` was supported in OpenCL. 10 seconds searching the OpenCL 2.0 specification shows that it isn't. — talonmies, Nov 05 '18 at 06:46
Possible duplicate of [clock() in opencl](https://stackoverflow.com/questions/8849486/clock-in-opencl) — talonmies, Nov 05 '18 at 06:47
@talonmies thank you for your comments..but still your comments doesn't answer the question as I am looking for something similar to clock...as I already said that it is kind of evident that clock or similar functions are not supported...so something that serves the similar purpose ..this paper (http://comparch.gatech.edu/hparch/papers/gera_ispass18.pdf) mentioned about the GTPin tool that does it..... — duttasankha, Nov 05 '18 at 18:44
It wasn't intended as an answer. I am trying to understand what the purpose of posting a bunch of error messages and code which you apparently already know won't and can't work? — talonmies, Nov 05 '18 at 18:51
I think you are too much fixated with your opinion when I am saying repeatedly that I am looking for an ALTERNATE WAY of achieving similar goals...I don't know how much more I could be clear about this....if you have any tips for that then it would be highly appreciated rather than just defying me...thanks... — duttasankha, Nov 05 '18 at 19:44
As I have also posted about the paper that uses GTPin to achieve the purpose...but at this point, it's usage is not very clear to me...I am working on that..if you have any similar tips that can be used then update it ...otherwise your comments are not leading to any useful answer..thanks... — duttasankha, Nov 05 '18 at 19:48
Why benchmark inside the kernel when you can use OpenCL event objects and measure the entire kernel execution? Microbenchmarking inside is going to be inconsistent and doesn't make sense at the individual work item level anyway. — Dithermaster, Nov 06 '18 at 22:24
It is for microarchitectural reverse engineering and because of this measuring the whole kernel execution won't serve my purpose. If someone could provide me some information regarding getting the performance metrics then that would be helpful as well. — duttasankha, Nov 07 '18 at 04:50

score 0 · Accepted Answer · answered Jan 30 '19 at 22:57

0

I have posted this in the intel opencl forum and the solution has been provided there. Please follow this forum post link to find the answer. If you have any following questions, you can post either in here or in the intel forum. Thanks.

answered Jan 30 '19 at 22:57

duttasankha

717
2
10
32

Measuring Time inside kernel on intel iGPU

1 Answers1