Building your own profiler: how to catch events?

Question

I couldn't really get an answer to this question, so I'll attempt to write a custom, although simple, profiler. Just to get started: suppose I need to find out, without recompiling, how much (and which) core is running my code. Suppose also I'd like to catch when a given function is executed. Finally, any thoughts about dealing with threads? Any other tips as to how to start? C is my language of choice, and I'm running Linux. Thanks.

Edit: Oprofile, CallGrind, Helgrind, gprof, papi, tau, and others I've analyzed seem not to match my needs.

And what are your needs? Please, explain them in the detail, may be even with example what do you want to get. This is because you listed all popular profilers as "not matching your needs". — osgx, Jul 25 '11 at 15:12
Right, sorry. So, I'd like per thread: time, tid, cpu running on, thread state, for example. — Dervin Thunk, Jul 25 '11 at 15:32
do you want "samples", e.g. check at every 1 millisecond, if the thread on CPU and which function is active? — osgx, Jul 25 '11 at 15:43

score 2 · Answer 1 · edited May 23 '17 at 12:19

2

I'm sure you've seen this before.

I find it helpful to distinguish two different objectives:

Measuring how long various things take, so you can make a presentation. As part of this presentation you might say something like "It looks like the frob routine is taking too much time, or being called too many times, suggesting we try to speed that up or call it less."
Pinpointing precise lines of code or instructions that are 1) not necessary, and 2) worth fixing, in the sense that they will save a good fraction of execution time.

I suspect the overall goal is the latter. But to do that, measuring is a very indirect approach. Instead, you could take advantage of the fact that, if something's wasting enough time to be worth looking at, you can simply catch it by taking snapshots of the program's state.

So you're not measuring in order to find what's taking time. The very fact that it takes time is what exposes it, unambiguously, with no suggesting involved.

Zoom is a profiler that works this way. So is LTProf. I built one once, but frankly I think the manual method, while more work, is more effective, because it makes me think harder about why the program's doing what it's doing.

edited May 23 '17 at 12:19

Community

1
1

answered Jul 25 '11 at 15:51

Mike Dunlavey

40,059
14
91
135

1

Thanks, Mike. Ya, I'd see the first answer you link to. It's not really that I want to profile in the usual sense of finding bottlenecks. It's **threads** that I want to profile: where they are executed, their state (R, S, ...), how much CPU is being consumed, etc. I want a general picture of threads. I get what you're saying re. the "manual" labor, but that's what I've been doing so far (I work on algorithm research), and I wanted to have a more general solution: I code a 30 LOC algorithm and the profiling adds another 100+ LOC! Thanks. – Dervin Thunk Jul 25 '11 at 15:59
@Dervin: Does `lsstack` or `pstack` give the info you want? It could be run in a loop in a separate process. – Mike Dunlavey Jul 25 '11 at 16:50
`pstack` must be attached to a running process, as far as I can tell. I need a time-sensitive trace. I couldn't find `lsstack`. – Dervin Thunk Jul 25 '11 at 17:12
@Dervin: It looks like it can handle a [bunch of processes/threads](http://linuxcommand.org/man_pages/pstack1.html). – Mike Dunlavey Jul 25 '11 at 18:09

osgx · Accepted Answer · 2011-07-25T16:15:51.537

You should try linux's perf https://perf.wiki.kernel.org/index.php/Tutorial This tool has direct support from kernel and knows about page-faults, CPU-migrations, context-switches (e.g. look at perf stat output). This stats can be aggregated per-process or per-cpu. perf record can be used like oprofile.

For adding your simple profiling you can use setitimer (the sampling signal is process-wide) or timer_create (timer signal can be installed for thread). You can't directly get information about physical cpu number used by thread, but at every sample you can per-thread run times with getrusage with RUSAGE_THREAD.

Building your own profiler: how to catch events?

2 Answers2