1

I'm currently doing some profiling tasks in Python using cProfile (among others). In the documentation it says:

cProfile and profile provide deterministic profiling of Python programs

During profiling, I keep getting slightly different results with cProfile for the whole program as well as for individual functions. And that's fine, I think I understand at least some of the reasons for that − for example, some OS tasks could run in the background and slow my program down. cProfile does operate on real time, not CPU time, after all.

But still, it raises the question: Why is it called deterministic profiling if doesn't produce the same results every time?

  • [Scroll down in the documentation.](https://docs.python.org/2/library/profile.html#what-is-deterministic-profiling) – Pointy Jun 28 '16 at 14:27
  • It's a 10-dollar word for a 10-cent concept. It's based on the empty assumption that measuring helps you *find* possible speedups, which it doesn't. It's only one silly step from there to forgetting your objective and instead concentrating on *accuracy* of the measurement. Plenty of hard-headed programmers on this site know the difference. [*Here's some of the nutty ideas that float around, and what actually works.*](http://stackoverflow.com/a/1779343/23771) – Mike Dunlavey Jun 28 '16 at 16:48
  • 10-dollar vs 10-cent :) I'll read your post, it seems intriguing at a first glance. – Icarus Wing Jun 30 '16 at 08:09

1 Answers1

1

The question is answered in the documentation:

27.4.5. What Is Deterministic Profiling?

Deterministic profiling is meant to reflect the fact that all function call, function return, and exception events are monitored, and precise timings are made for the intervals between these events (during which time the user’s code is executing). In contrast, statistical profiling (which is not done by this module) randomly samples the effective instruction pointer, and deduces where time is being spent. The latter technique traditionally involves less overhead (as the code does not need to be instrumented), but provides only relative indications of where time is being spent.

In Python, since there is an interpreter active during execution, the presence of instrumented code is not required to do deterministic profiling. Python automatically provides a hook (optional callback) for each event. In addition, the interpreted nature of Python tends to add so much overhead to execution, that deterministic profiling tends to only add small processing overhead in typical applications. The result is that deterministic profiling is not that expensive, yet provides extensive run time statistics about the execution of a Python program.

Call count statistics can be used to identify bugs in code (surprising counts), and to identify possible inline-expansion points (high call counts). Internal time statistics can be used to identify “hot loops” that should be carefully optimized. Cumulative time statistics should be used to identify high level errors in the selection of algorithms. Note that the unusual handling of cumulative times in this profiler allows statistics for recursive implementations of algorithms to be directly compared to iterative implementations.

That means that it is not depending on some sampling and under identical conditions should reproduce the data.

Community
  • 1
  • 1
syntonym
  • 7,134
  • 2
  • 32
  • 45
  • Ah, so the term 'deterministic' is just used to distinguish it from statistical profilers? I guess it's just a semantic issue, but the fact that a 'deterministic' profiler can return different results for the same inputs just rubs me the wrong way. It would make more sense if cProfile actually measured CPU time instead of real time. Oh well..... but thanks for answering anyway! – Icarus Wing Jun 28 '16 at 15:19