0

I want to measure the performance (read runtime) of my kernel code on various devices viz CPU and GPUs. The kernel code that I wrote is:

__kernel void dataParallel(__global int* A)
{  
    sleep(10);
    A[0]=2;
    A[1]=3;
    A[2]=5;
    int pnp;//pnp=probable next prime
    int pprime;//previous prime
    int i,j;
    for(i=3;i<500;i++)
    {
        j=0;
        pprime=A[i-1];
        pnp=pprime+2;
        while((j<i) && A[j]<=sqrt((float)pnp))
        {
            if(pnp%A[j]==0)
                {
                    pnp+=2;
                    j=0;
                }
            j++;

        }
        A[i]=pnp;

    }
}

However I have been told that it is not possible to use sleep() in the kernel code. If that is true then can someone give the reason and if it isn't please tell the way to implement the same.

Also, as I said that I wish to compare the performance of my CPU and the GPUs, one of the ways to achieve that is by computing the run time of the kernel code on the various devices while if there was another way by which I could get the code to start executing on all the devices at the same time then I would just have to list the corresponding end time of execution and that would serve the purpose as well! Is it possible?

Hardware Details:

GPU: AMD FirePro W7000, NVIDIA TESLA C2075 CPU: Intel(R) XEON(R) CPU X5660 @ 2.80GHZn

talonmies
  • 70,661
  • 34
  • 192
  • 269
ikk
  • 123
  • 3
  • 14
  • What does `sleep()`have to do with measuring performance? – talonmies Jul 04 '15 at 09:20
  • I could have the thread to sleep for say 30s and start execution of the kernel code on all three devices. If the run time is > 90s(for 3 devices) then it would imply that the code is executing serially and not parallelly as it is supposed to while if the run time is slightly higher than 30s then my code is executing parallelly on all three devices. – ikk Jul 04 '15 at 09:24
  • @talonmies could you please suggest a better way to achieve that. Thnx – ikk Jul 04 '15 at 09:31
  • To do what? To calculate runtime you need to use a timer of some sort . The internet is overflowing with examples of basic host timers or OpenCL event based timers, for example [here](http://stackoverflow.com/q/23550912/681865). Google is your friend – talonmies Jul 04 '15 at 10:41
  • Double Post http://stackoverflow.com/questions/31203240/implement-sleep-in-opencl-c/31207695?noredirect=1#comment50438115_31207695 – Käptn Freiversuch Jul 04 '15 at 13:56

1 Answers1

2

However I have been told that it is not possible to use sleep() in the kernel code.

It's not that it's not possible; it might be. I don't know. That's not really specified in C. Having said that, it's simply not a good idea to block execution of a kernel until a period of time has elapsed. Even in general purpose programming, that doesn't seem like a good idea. Your function should finish processing as soon as possible, or pass control back to the kernel so that it can find something else to do while it's waiting on idle tasks.

Also, as I said that I wish to compare the performance of my CPU and the GPUs, one of the ways to achieve that is by computing the run time of the kernel code on the various devices while if there was another way by which I could get the code to start executing on all the devices at the same time then I would just have to list the corresponding end time of execution and that would serve the purpose as well! Is it possible?

Sure, something like that... but... I'm not even sure why you think injecting sleep(10) into each task will help you; you haven't explained that here. It doesn't seem like a requirement for profiling your code (e.g. checking its speed). Have you ever heard of the XY problem? I think sleep is your Y variable, in this case.

I mentioned profiling just now. Have you learnt about profilers? They do exactly what it is you're aiming to do, except that they do it without you having to write any code. Here's a tutorial on using perf to profile the Linux kernel...

Community
  • 1
  • 1
autistic
  • 1
  • 3
  • 35
  • 80
  • I could have the thread to sleep for say 30s and start execution of the kernel code on all three devices. If the run time is > 90s(for 3 devices) then it would imply that the code is executing serially and not parallelly as it is supposed to while if the run time is slightly higher than 30s then my code is executing parallelly on all three devices. – ikk Jul 04 '15 at 16:55
  • 1
    Sleeping for any number of seconds in the kernel is a horrible idea because 1. Sleep typically calls a system call which tells the kernel to move on to some other task (might be yours? then you don't get the sleep you wanted... heh) or 2. When it doesn't, it'll block the thread (which is *the kernels thread*, not *your thread*) until that period of time has elapsed. It's horrible either way. Do what you want to do, but I've told you how you can solve your problem *without writing code*. – autistic Jul 05 '15 at 01:17