1

I tried to use Python profiler statprof for profiling some of my code. The documentation says that if a procedure is encountered multiple times, it is only counted once. However in the output of the profile i see multiple occurrences of the same procedure with different line numbers. Anybody has any idea what this might be showing?

eg output

  0.00      0.01      0.00  box.py:1305:**do_forever**

  0.00      0.01      0.00  httplib.py:981:_send_request

  0.00      0.01      0.00  box.py:1295:**do_forever**

  0.00      0.01      0.00  box.py:1295:**do_forever**

  0.00      0.01      0.00  sre_compile.py:486:_code

  0.00      0.01      0.00  box.py:923:get_user_file_info

  0.00      0.00      0.00  box.py:776:update_all_internal_user_stats_batch
ifixthat
  • 6,137
  • 5
  • 25
  • 42
auny
  • 1,920
  • 4
  • 20
  • 37

1 Answers1

1

I just looked at the doc and what it says is if a procedure is encountered more than once in a single stack sample, it is only counted once. (I assume here the term "procedure" should really be "line of code in a procedure".)

What that means is - if there is recursion going on, you don't want the time cost of a function to be artificially amplified by the depth of the recursion.

Also, I would point out some other things (explained in more detail here):

The first column is "% time" not counting children, i.e. self percent. That's a useless statistic, because there is almost nothing that is not a procedure call, and looking at the line you can tell if it is a procedure call.

The second column is "cumulative time" including children. That's fine, but it should be percent, so that you don't have to divide by total time to see what percent it is. The reason that number matters is it represents what the line is responsible for - the fraction of total time that could be saved if that line were not there.

The third column is "self time" which, as I explained, is a useless statistic. The fact that it's extremely small or zero reflects its uselessness. Since it is included in the cumulative time, if it were not extremely small, the cumulative time would also show it, so it tells you nothing that the cumulative time doesn't.

Also, as the author points out, samples are suspended during I/O, so if it is doing some I/O that you didn't want or ask for, deep in some library, and if that is making the program take 100 times as long as it otherwise would, the profiler (and you) will be totally unaware of it.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Hey thanks Mike for the detailed explanation. I was just confused about the multiple entries. meanings of other terms were clear. But thanks anyway – auny Aug 06 '12 at 18:24
  • @auny: You're welcome and good luck. BTW, when I'm working in Python, [*this is how I find out what needs to be fixed.*](http://stackoverflow.com/a/4299378/23771) – Mike Dunlavey Aug 06 '12 at 20:52