43

Linux perf tools (some time ago named perf_events) has several builtin universal software events. Two most basic of them are: task-clock and cpu_clock (internally called PERF_COUNT_SW_CPU_CLOCK and PERF_COUNT_SW_TASK_CLOCK). But what is wrong with them is lack of description.

ysdx user reports that man perf_event_open has short description:

    PERF_COUNT_SW_CPU_CLOCK
          This reports the CPU clock, a high-resolution per-
          CPU timer.

    PERF_COUNT_SW_TASK_CLOCK
          This reports a clock count specific to the task
          that is running.

But the description is hard to understand.

Can somebody give authoritative answer about how and when the task-clock and cpu-clock events are accounted? How does they relate to the linux kernel scheduler?

When task-clock and cpu-clock will give different values? Which one should I use?

Community
  • 1
  • 1
osgx
  • 90,338
  • 53
  • 357
  • 513
  • 4
    Frederic Weisbecker [supposed](https://lkml.org/lkml/2010/11/3/373) in LKML in 2010 that: "*cpu-clock is based on the total time spent on the cpu. task-clock is based only on the time spent on the profiled task, so that doesn't count time spent on other tasks, it has a per thread granularity*", and noted that he "*might be somehow wrong in .. explanation*". And Peter Zijlstra [says](https://lkml.org/lkml/2010/11/30/476) that "*it actually makes sense to count both cpu and task clock on a task (cpu clock basically being wall-time).*" – osgx May 31 '14 at 00:59
  • 5
    Source file for both events: `kernel/events/core.c`, ["*Software event: cpu wall time clock*" (line 6092, `cpu_clock_event_*`)](http://lxr.free-electrons.com/source/kernel/events/core.c?v=3.13#L6092) for cpu-clock (`pmu perf_cpu_clock`), and ["*Software event: task time clock*" (line 6168, `task_clock_event_*`)](http://lxr.free-electrons.com/source/kernel/events/core.c?v=3.13#L6168) for task-clock (`pmu perf_task_clock`). Both are based on hrtimers, but update function is different: `cpu_clock_event_update` uses `local_clock` and `task_clock_event_update` uses `event->ctx->time`.... – osgx May 31 '14 at 01:51
  • 4
    [Robert Haas, "perf: the good, the bad, the ugly", 2012](http://rhaas.blogspot.com/2012/06/perf-good-bad-ugly.html):"*But perf also includes software events like cpu-clock and task-clock that have no meaning apart from the Linux kernel, and there's no documentation about what those mean. I get approximately equivalent results from profiling with the default event, cycles, with task-clock, and with cpu-clock,... no explanation anywhere what the difference is. For perf to be broadly useful,.. it needs documentation explaining what all of these events are, and how to make effective use of them.*" – osgx May 31 '14 at 01:54
  • For me the description look likes the cpu clock is maybe the amount of jiffies from OS start time. And the task clock is maybe the jiffies since task start. Please check man page proc(5) and search for starttime in it. Maybe https://stackoverflow.com/a/44524937/1950345 can give you more information. – reichhart Jun 24 '17 at 02:16
  • @reichhart, perf uses difference between values, not the absolute values. Did you even used perf record or perf stat? Your comment is probably not related to the question: in perf code cpu clock is `local_clock()` and task clock is `perf_clock()` and some hrtimer. – osgx Jun 24 '17 at 13:10
  • OK, then it is not "start time" but a different point in time. Actually everything is relative. ;-) And within this period of time cpu-clock is total cpu time and task-clock only of the task. Actually it was your first comment that made me think of this. :-) – reichhart Jun 24 '17 at 17:03
  • @reichhart, when we perf-record (or perf-stat) some task, all perf counters are counted only when task is running on CPU. So it may be reasonable to expect that task-clock and cpu-clock will tick for the (almost) same time in process profiling. Probably there is some difference in perf-record (or perf-stat) in system-wide mode (-a option), but at any time there is some task on the CPU (PID 0 when it is idle, something strange when CPU core/chip is offline). – osgx Jun 25 '17 at 00:13
  • did you ever get to the bottom of this? – jberryman Feb 06 '19 at 03:48

3 Answers3

11

1) By default, perf stat shows task-clock, and does not show cpu-clock. Therefore we can tell task-clock was expected to be much more useful.

2) cpu-clock was simply broken, and has not been fixed for many years. It is best to ignore it.

It was intended that cpu-clock of sleep 1 would show about 1 second. In contrast, task-clock would show close to zero. It would have made sense to use cpu-clock to read wall clock time. You could then look at the ratio between cpu-clock and task-clock.

But in the current implementation, cpu-clock is equivalent to task-clock. It is even possible that "fixing" the existing counter might break some userspace program. If there is such a program, Linux might not be able to "fix" this counter. Linux might need to define a new counter instead.

Exception: starting with v4.7-rc1, when profiling a CPU or CPUs - as opposed to a specific task - e.g. perf stat -a. perf stat -a shows cpu-clock instead of task-clock. In this specific case, the two counters were intended to be equivalent. The original intention for cpu-clock makes more sense in this case. So for perf stat -a, you could just ignore this difference, and interpret it as task-clock.

If you write your own code which profiles a CPU or CPUs - as opposed to a specific task - perhaps it would be clearest to follow the implementation of perf stat -a. But you might link to this question, to explain what your code is doing :-).

Subject: Re: perf: some questions about perf software events
From: Peter Zijlstra

On Sat, 2010-11-27 at 14:28 +0100, Franck Bui-Huu wrote:

Peter Zijlstra writes:

On Wed, 2010-11-24 at 12:35 +0100, Franck Bui-Huu wrote:

[...]

Also I'm currently not seeing any real differences between cpu-clock and task-clock events. They both seem to count the time elapsed when the task is running on a CPU. Am I wrong ?

No, Francis already noticed that, I probably wrecked it when I added the multi-pmu stuff, its on my todo list to look at (Francis also handed me a little patchlet), but I keep getting distracted with other stuff :/

OK.

Does it make sense to adjust the period for both of them ?

Also, when creating a task clock event, passing 'pid=-1' to sys_perf_event_open() doesn't really make sense, does it ?

Same with cpu clock and 'pid=n': whatever value, the event measure the cpu wall time clock.

Perhaps proposing only one clock in the API and internally bind this clock to the cpu or task clock depending on pid or cpu parameters would have been better ?

No, it actually makes sense to count both cpu and task clock on a task (cpu clock basically being wall-time).

On a more superficial level, perf stat output for cpu-clock can be slightly different from that of task-clock in perf earlier than v4.7-rc1. For example, it may print "CPUs utilized" for task-clock but not for cpu-clock.

Hadi Brais
  • 22,259
  • 3
  • 54
  • 95
sourcejedi
  • 3,051
  • 2
  • 24
  • 42
  • 2
    Can you link the mailing list archive you're quoting here, just for reference and so we know *who* you're quoting? – Peter Cordes Jul 10 '19 at 09:53
  • 1
    @PeterCordes +1, edited. I was kinda responding to the question comments. Hopefully this is the authoritative answer, and all the question comments can be wiped :-). – sourcejedi Jul 10 '19 at 10:18
  • `In this specific case, the two counters were intended to be equivalent. The original intention for cpu-clock makes more sense in this case.` But in per-cpu mode, only cpu-clock makes sense. That's why in v4.7-rc1 cpu-clock is used instead of task-clock by default in per-cpu mode. (See my edit to the answer.) – Hadi Brais Sep 15 '19 at 21:36
3

Generally speaking: The cpu-clock event measures the passage of time. It uses the Linux CPU clock as the timing source.

Here is an in-depth article on finding execution hot spots with perf: http://sandsoftwaresound.net/perf/perf-tutorial-hot-spots/

The task-clock tells you how parallel your job has been/how many cpus were used. This compendium contains detaild information of output generated by perf: https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/PerfTools

There is also a whole lot of information here: https://stackoverflow.com/a/20378648/8223204

  • 2
    Patrick, I need not basic info about perf usage, but real difference between exact software events: cpu-clock and task-clock (with references to documentation or kernel source or books). Both will measure time of parallel tasks, as cpu-clock ticks when the thread is active (running on this core). In other words: **when there will be huge difference between these events** in command `perf stat -e cpu-clock,task-clock ./program`, for example `echo 2^234567%2 | perf stat -e cpu-clock,task-clock /usr/bin/bc` for 1 thread and `perf ... pixz -1 ./huge_file` (multithread) show almost equal counts. – osgx Jun 28 '17 at 23:18
1

According to this message, they measure the same thing.

They just differ in when they sample.

cpu-clock is wall-clock based -- so samples are taken at regular intervals relative to walltime. I believe that task-clock is relative to the task run time. So, samples are taken at regular intervals relative to the process' runtime.

When I run it on my multi-threaded app, it indeed shows nearly identical values.

Bram
  • 7,440
  • 3
  • 52
  • 94
  • Your reference is old and is from David Ahern who is probably not the kernel/perf developer. What is wall time for the program (the case when only single program is profiled, so profiling is turned on only when the program or kernel works on behalf of it - so when **task is running**)? What will be task clock for the case of system-wide profiling (`-a`) when there is some thread (user or kernel) running on every CPU and even for idle there is synthetic task running (http://elixir.free-electrons.com/linux/v4.13.10/source/kernel/sched/idle.c#L142)] – osgx Oct 30 '17 at 23:33