51

I have a Python script in a file which takes just over 30 seconds to run. I am trying to profile it as I would like to cut down this time dramatically.

I am trying to profile the script using cProfile, but essentially all it seems to be telling me is that yes, the main script took a long time to run, but doesn't give the kind of breakdown I was expecting. At the terminal, I type something like:

cat my_script_input.txt | python -m cProfile -s time my_script.py

The results I get are:

<my_script_output>
             683121 function calls (682169 primitive calls) in 32.133 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   31.980   31.980   32.133   32.133 my_script.py:18(<module>)
   121089    0.050    0.000    0.050    0.000 {method 'split' of 'str' objects}
   121090    0.038    0.000    0.049    0.000 fileinput.py:243(next)
        2    0.027    0.014    0.036    0.018 {method 'sort' of 'list' objects}
   121089    0.009    0.000    0.009    0.000 {method 'strip' of 'str' objects}
   201534    0.009    0.000    0.009    0.000 {method 'append' of 'list' objects}
   100858    0.009    0.000    0.009    0.000 my_script.py:51(<lambda>)
      952    0.008    0.000    0.008    0.000 {method 'readlines' of 'file' objects}
 1904/952    0.003    0.000    0.011    0.000 fileinput.py:292(readline)
    14412    0.001    0.000    0.001    0.000 {method 'add' of 'set' objects}
      182    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        1    0.000    0.000    0.000    0.000 fileinput.py:80(<module>)
        1    0.000    0.000    0.000    0.000 fileinput.py:197(__init__)
        1    0.000    0.000    0.000    0.000 fileinput.py:266(nextfile)
        1    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 fileinput.py:91(input)
        1    0.000    0.000    0.000    0.000 fileinput.py:184(FileInput)
        1    0.000    0.000    0.000    0.000 fileinput.py:240(__iter__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

This doesn't seem to be telling me anything useful. The vast majority of the time is simply listed as:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   31.980   31.980   32.133   32.133 my_script.py:18(<module>)

In my_script.py, Line 18 is nothing more than the closing """ of the file's header block comment, so it's not that there is a whole load of work concentrated in Line 18. The script as a whole is mostly made up of line-based processing with mostly some string splitting, sorting and set work, so I was expecting to find the majority of time going to one or more of these activities. As it stands, seeing all the time grouped in cProfile's results as occurring on a comment line doesn't make any sense or at least does not shed any light on what is actually consuming all the time.

EDIT: I've constructed a minimum working example similar to my above case to demonstrate the same behavior:

mwe.py

import fileinput

for line in fileinput.input():
    for i in range(10):
        y = int(line.strip()) + int(line.strip())

And call it with:

perl -e 'for(1..1000000){print "$_\n"}' | python -m cProfile -s time mwe.py

To get the result:

         22002536 function calls (22001694 primitive calls) in 9.433 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    8.004    8.004    9.433    9.433 mwe.py:1(<module>)
 20000000    1.021    0.000    1.021    0.000 {method 'strip' of 'str' objects}
  1000001    0.270    0.000    0.301    0.000 fileinput.py:243(next)
  1000000    0.107    0.000    0.107    0.000 {range}
      842    0.024    0.000    0.024    0.000 {method 'readlines' of 'file' objects}
 1684/842    0.007    0.000    0.032    0.000 fileinput.py:292(readline)
        1    0.000    0.000    0.000    0.000 fileinput.py:80(<module>)
        1    0.000    0.000    0.000    0.000 fileinput.py:91(input)
        1    0.000    0.000    0.000    0.000 fileinput.py:197(__init__)
        1    0.000    0.000    0.000    0.000 fileinput.py:184(FileInput)
        1    0.000    0.000    0.000    0.000 fileinput.py:266(nextfile)
        1    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 fileinput.py:240(__iter__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Am I using cProfile incorrectly somehow?

Bryce Thomas
  • 10,479
  • 26
  • 77
  • 126
  • Can you show us your script? Or, better, a [stripped-down version that still exhibits the same odd profile behavior](http://stackoverflow.com/help/mcve)? – abarnert Jan 22 '14 at 05:28
  • Also, is it possible that you have a `my_script.pyc` that's out of date but somehow has a later timestamp than `my_script.py`? If that happens, you can end up actually running (and profiling) the code out of the .pyc, but all the line numbers are out of date. – abarnert Jan 22 '14 at 05:29
  • One last thing, as a wild guess: Are you doing a `f.readlines()` somewhere and then looping over the results? – abarnert Jan 22 '14 at 05:29
  • One more one last thing: Occasionally, you can't get cProfile to work externally, but you can use it internally, e.g., as shown in [this example](http://docs.python.org/2/library/profile.html#profile.Profile). – abarnert Jan 22 '14 at 05:33
  • @abarnert I've added a minimum example to the question. – Bryce Thomas Jan 22 '14 at 05:46
  • Out of curiosity, what platform are you on? I'm on OS X 10.9.1, using Apple's 64-bit CPython 2.7.5 and a custom-built 64-bit 3.4al if you're on the same platform, we may have just found a Mac bug; if you're on a different platform, it's probably something more general (and quite possibly not a bug at all, just something we're both missing). – abarnert Jan 22 '14 at 07:07
  • Ubuntu Linux, Python 2.7.3. – Bryce Thomas Jan 22 '14 at 07:11
  • See [issue 12953: Function calls missing from profiler output](http://bugs.python.org/issue12953), especially Terry J. Reedy's last comment (msg144144). It appears that in 2.7 and 3.x, in some cases, C functions that return iterator types get skipped. I know `fileinput.input()` is a Python function, but… could it be spending most of its time waiting on a C iterator? – abarnert Jan 22 '14 at 07:14
  • On that last one… no, it's not. It's spending most of its file on `file.readlines(bufsize)`, which is a C function, but it of course returns a plain old list of strings, not an iterator. – abarnert Jan 22 '14 at 07:22
  • Another thing to look at: Use a third-part profiler like [`line_profiler`](http://pythonhosted.org/line_profiler/) instead of `cProfile`. Or, alternatively, use the tried-and-true stack-sampling method: ^C your program a dozen or so times and look at the tracebacks; there will usually be obvious patterns that tell you 10/12ths of your time is spent inside functionX which you didn't expect to be so slow. You can then break it into smaller functions, do internal profiling of that function, extract it to unit-profile it, etc. – abarnert Jan 22 '14 at 07:36

1 Answers1

70

As I mentioned in a comment, when you can't get cProfile to work externally, you can often use it internally instead. It's not that hard.

For example, when I run with -m cProfile in my Python 2.7, I get effectively the same results you did. But when I manually instrument your example program:

import fileinput
import cProfile

pr = cProfile.Profile()
pr.enable()
for line in fileinput.input():
    for i in range(10):
        y = int(line.strip()) + int(line.strip())
pr.disable()
pr.print_stats(sort='time')

… here's what I get:

         22002533 function calls (22001691 primitive calls) in 3.352 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 20000000    2.326    0.000    2.326    0.000 {method 'strip' of 'str' objects}
  1000001    0.646    0.000    0.700    0.000 fileinput.py:243(next)
  1000000    0.325    0.000    0.325    0.000 {range}
      842    0.042    0.000    0.042    0.000 {method 'readlines' of 'file' objects}
 1684/842    0.013    0.000    0.055    0.000 fileinput.py:292(readline)
        1    0.000    0.000    0.000    0.000 fileinput.py:197(__init__)
        1    0.000    0.000    0.000    0.000 fileinput.py:91(input)
        1    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 fileinput.py:266(nextfile)
        1    0.000    0.000    0.000    0.000 fileinput.py:240(__iter__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

That's a lot more useful: It tells you what you probably already expected, that more than half your time is spent calling str.strip().


Also, note that if you can't edit the file containing code you wish to profile (mwe.py), you can always do this:

import cProfile
pr = cProfile.Profile()
pr.enable()
import mwe
pr.disable()
pr.print_stats(sort='time')

Even that doesn't always work. If your program calls exit(), for example, you'll have to use a try:/finally: wrapper and/or an atexit. And it it calls os._exit(), or segfaults, you're probably completely hosed. But that isn't very common.


However, something I discovered later: If you move all code out of the global scope, -m cProfile seems to work, at least for this case. For example:

import fileinput

def f():
    for line in fileinput.input():
        for i in range(10):
            y = int(line.strip()) + int(line.strip())
f()

Now the output from -m cProfile includes, among other things:

  2000000    4.819    0.000    4.819    0.000 :0(strip)
   100001    0.288    0.000    0.295    0.000 fileinput.py:243(next)

I have no idea why this also made it twice as slow… or maybe that's just a cache effect; it's been a few minutes since I last ran it, and I've done lots of web browsing in between. But that's not important, what's important is that most of the time is getting charged to reasonable places.

But if I change this to move the outer loop to the global level, and only its body into a function, most of the time disappears again.


Another alternative, which I wouldn't suggest except as a last resort…

I notice that if I use profile instead of cProfile, it works both internally and externally, charging time to the right calls. However, those calls are also about 5x slower. And there seems to be an additional 10 seconds of constant overhead (which gets charged to import profile if used internally, whatever's on line 1 if used externally). So, to find out that split is using 70% of my time, instead of waiting 4 seconds and doing 2.326 / 3.352, I have to wait 27 seconds, and do 10.93 / (26.34 - 10.01). Not much fun…


One last thing: I get the same results with a CPython 3.4 dev build—correct results when used internally, everything charged to the first line of code when used externally. But PyPy 2.2/2.7.3 and PyPy3 2.1b1/3.2.3 both seem to give me correct results with -m cProfile. This may just mean that PyPy's cProfile is faked on top of profile because the pure-Python code is fast enough.


Anyway, if someone can figure out/explain why -m cProfile isn't working, that would be great… but otherwise, this is usually a perfectly good workaround.

Cimbali
  • 11,012
  • 1
  • 39
  • 68
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • It is not that `-m cProfile` is not working. I think it is taking into account the time it takes to load the interpreter and import all the needed libraries. This might be accounted for as Line 1 of the script. – arocks Jan 22 '14 at 06:48
  • 2
    @arocks With no profiling whatsoever, my script takes ~30 seconds to execute. If the ~30 seconds reported by the profiler in Line 18 of my script were really for setup, the remaining fractions of a second wouldn't account for the rest of the work. Having watched the output of my real script in real-time, I can confirm that the imports etc. take very little time before the script reaches the real work section and starts producing results. – Bryce Thomas Jan 22 '14 at 06:59
  • @arocks: No, you can easily verify that all of that is done in a small fraction of a second. For example, add a `break` at the end of the inner `for` loop and then another one dedented. All the same work being done, exactly one. Then `-m cProfile` it, and it tells you there's 0.002 seconds. Also, if you run in 3.4 instead of 2.7, most of the startup and import machinery is in Python, and gets charged correctly, and it's about 0.001 seconds. – abarnert Jan 22 '14 at 06:59
  • 1
    @abarnert Thanks - this answer provides a lot of useful information. Sadly on my real script, if I add the profiling in the script itself, the profiler reports that the entire script took just 0.151 seconds to run, even though it clearly takes much longer in reality (~30 secs). I guess this is another question altogether though. – Bryce Thomas Jan 22 '14 at 07:04
  • @BryceThomas: One more thing I discovered; let me edit one more time… although I suspect it won't help you, it might. – abarnert Jan 22 '14 at 07:16
  • Can you sum the `tot time` column? – Chuck Nov 03 '18 at 21:14