-2

I want to measure the performance of different devices viz CPU and GPUs. This is my kernel code:

__kernel void dataParallel(__global int* A)
{  
    sleep(10);
    A[0]=2;
    A[1]=3;
    A[2]=5;
    int pnp;//pnp=probable next prime
    int pprime;//previous prime
    int i,j;
    for(i=3;i<10;i++)
    {
        j=0;
        pprime=A[i-1];
        pnp=pprime+2;
        while((j<i) && A[j]<=sqrt((float)pnp))
        {
            if(pnp%A[j]==0)
                {
                    pnp+=2;
                    j=0;
                }
            j++;

        }
        A[i]=pnp;

    }
}

However the sleep() function doesnt work. I am getting the following error in buildlog:

<kernel>:4:2: warning: implicit declaration of function 'sleep' is      invalid in C99
    sleep(10);
builtins: link error: Linking globals named '__gpu_suld_1d_i8_trap': symbol multiply defined!

Is there any other way to implement the function. Also is there a way to record the time taken to execute this code snippet.

P.S. I have included #include <unistd.h> in my host code.

ikk
  • 123
  • 3
  • 14

1 Answers1

1

You dont need to use sleep in your kernel to measure the execution time.

There are two ways to measure the time. 1. Use opencl inherent profiling look here: cl api

  1. get timestamps in your hostcode and compare them before and after execution. example:

        double start = getTimeInMS();
        //The kernel starts here
        clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &tasksize, &local_size_in, 0, NULL, NULL)
    //wait for kernel execution
    clFinish(command_queue);
    cout << "kernel execution time " << (getTimeInMS() - start) << endl;
    

Where getTimeinMs() is a function that returns a double value of miliseconds: (windows specific, override with other implementation if you dont use windows)

static inline double getTimeInMS(){

SYSTEMTIME st;
GetLocalTime(&st);

return (double)st.wSecond * (double)1000 + (double)st.wMilliseconds;}

Also you want to:

#include <time.h>

For Mac it would be (could work on Linux as well, not sure):

 static inline double getTime() {
    struct timeval starttime;
    gettimeofday(&starttime, 0x0);


    return (double)starttime.tv_sec * (double)1000 + (double)starttime.tv_usec / (double)1000;}
Käptn Freiversuch
  • 258
  • 1
  • 3
  • 14
  • I'll definitely try that. Thnx! Also, as I said that I wish to compare the performance of my CPU and the GPUs, one of the ways to achieve that is by computing the run time of the kernel code while if there was another way by which I could get the code to start executing on all the devices at the same time then I would just have to list their corresponding end time of execution and that would serve the purpose as well! Is it possible? – ikk Jul 04 '15 at 09:11
  • To do so, you need to create a opencl context for each device. You can as well just measure the time and change the picked device 3 times and compare the times afterwards. – Käptn Freiversuch Jul 04 '15 at 11:18