Python Profile Pitfalls

Question

I am a beginner just starting to profile my code and was confused why the elapsed time given by cProfile was so off from the time given by using time.time().

# Python 2.7.2

import cProfile

def f(n):
    G = (i for i in xrange(n))
    sum = 0
    for i in G:
        sum += i

num = 10**6

cProfile.run('f(num)')

This gives 1000004 function calls in 2.648 seconds

Yet with time.time(), I get 0.218000173569 seconds

import time

x = time.time()
f(num)
print time.time() - x

From what I have read, I guess this may be because of the overhead of cProfile. Are there any general tips for when cProfile timing is likely to be very off, or ways to get more accurate timing?

John Y · Accepted Answer · 2011-07-10T01:06:45.103

3

The point of profiling is to find out what parts of your program are taking the most time, and thus need the most attention. If 90% of the time is being used by one function, you should be looking there to see how you can make that function more efficient. It doesn't matter whether the entire run takes 10 seconds or 1000.

Perhaps the most important piece of information the profiler gives you is how many times something is called. Why this is useful is that it helps you find places where you are calling things unnecessarily often, especially if you have nested loops, or many functions that call other functions. The profiler helps you track this stuff down.

The profiling overhead is unavoidable, and large. But it is much easier to let the profiler do what it does than to insert your own timings and print statements all over the place.

edited Jul 10 '11 at 01:06

answered Jul 10 '11 at 01:01

John Y

14,123
2
48
72

"If 90% of the time is being used by one function..." -- my main worry is not about exact time spent, but about the relative times. I can imagine that something that looks like it takes up 90% of the time actually takes up 1%. I understand that there are intricacies involved, but I am wondering if there are useful heuristics like "[some] type of operation will generally have inflated times in the profiler because of [some reason]". – rallen Jul 10 '11 at 01:29
@rallen: Getting the percentage of time spent (rather than absolute time) is precisely the point I was trying to make. When you use the profiler, you shouldn't rely on the absolute times so much as the relationship between the listed times of the various components being profiled. Note that normally, people use the profiler to examine much more complicated functions than your example here. – John Y Jul 12 '11 at 21:38
@rallen: As for intricacies messing up the profiler figures so that a genuine bottleneck is hidden, or that a fast section appears like a bottleneck: I don't think you will encounter this in anything you are likely to write. I can't say it will never happen, but it isn't something I have ever had reason to worry about. If you do happen to encounter something that seems to have profiled in an unbelievable or incredibly fishy way, then bring *that* up as a Stack Overflow or comp.lang.python question. ;) – John Y Jul 12 '11 at 21:45

score 1 · Answer 2 · edited May 23 '17 at 11:56

1

Note that cProfile gives you CPU time, but Using time.time() gives you elapsed time (which isn't what you want).

maybe you can try using the unix time program.

➜  sandbox  /usr/bin/time -p python profiler.py

real         0.17
user         0.14
sys          0.01

The CPU time should be user+sys

edited May 23 '17 at 11:56

Community

1
1

answered Jul 10 '11 at 00:58

minghan

1,013
7
12

Python Profile Pitfalls

2 Answers2