1

I recently compared 2 kinds of doing kernel runtime measuring and I see some confusing results.

I use an AMD Bobcat CPU (E-350) with integrated GPU and Ubuntu Linux (CL_PLATFORM_VERSION is OpenCL 1.2 AMD-APP (923.1)).

The basic gettimeofday idea looks like this:

clFinish(...)  // that all tasks are finished on the command queue
gettimeofday(&starttime,0x0)
clEnqueueNDRangeKernel(...)
clFlush(...)
clWaitForEvents(...)
gettimeofday(&endtime,0x0)

This says the kernel needs around 5466 ms.

Second time measurement I did with clGetEventProfilingInfo for QUEUED / SUBMIT / START / END.

With the 4 time values I can calculate the time spend in the different states:

  • time spend queued: 0.06 ms,
  • time spend submitted: 2733 ms,
  • time spend in execution: 2731 ms (actual execution time).

I see that it adds up to the 5466 ms, but why does it stay in submitted state for half the time?

And the funny things are:

  • the submitted state is always half of the actual execution time, even for different kernels or different workload (so it can't be a constant setup time),

  • for the CPU the time spend in submitted state is 0 and the execution time is equal to the gettimeofday result,

  • I tested my kernels on an Intel Ivy Bridge with windows using CPU and GPU and I didn't see the effects there.

Does anyone have a clue?

I suspect that either the GPU runs the kernel twice (resulting in gettimeofday being double of the actual execution time) or that the function clGetEventProfilingInfo is not working correctly for the AMD GPU.

the swine
  • 10,713
  • 7
  • 58
  • 100
Tomas
  • 235
  • 2
  • 9
  • Do you send any data to the device? You may have measured some I/O time too. Try to reduce the amount of data transferred, or just disable the clEnqueueWriteBuffer to see how it changes your measurements. – Eric Bainville Sep 26 '12 at 16:16
  • I send some data to the device, but I do it before running the kernel and the clFinish(...) before measuring should finish all tasks on the queue. Also I do the data copying with CL_TRUE that it blocks until the data is transfered. – Tomas Sep 26 '12 at 17:08
  • Edit note: You need to read about, how do we format posts here. Your formatting was horrible! You mixed HTML with Markdown. We use **Markdown** here. I've just cleared your question, but it was surely a hard task. Lot of mess... – trejder Nov 20 '14 at 11:30

1 Answers1

1

I posted the problem in an AMD forum. They say it's a bug in the AMD profiler.

http://devgurus.amd.com/thread/159809

Tomas
  • 235
  • 2
  • 9