6

Often during my work I write code to read lines from a file and I process those lines one at a time.

Sometimes the line processing is complicated and the file is long, for example, today it takes roughly a minute for processing 200 lines and the total lines in the file are 175k.

I want to figure out which part of my code is taking a long time and for that I decided to use the cProfiler in Python.

The problem is that I can't actually run the whole code because that would take too long, and if I interrupt the process midway through an exit signal then I cProfiler also dies without producing a report and modifying code with logic to die after a certain reading only top K lines is annoying (because I tend to this kind of thing a lot for different types of data in my job.) I want to avoid adding options only for the sake of profiling if possible.

What would be the cleanest way to tell cProfiler to run for 3 minutes, profile what happens, stop and then report its findings?

Peter O.
  • 32,158
  • 14
  • 82
  • 96
Pushpendre
  • 795
  • 7
  • 19
  • Can't you just take a couple of lines from the file and create a second one so that you can profile it and find the bottle neck in the processing code? – dursk Dec 06 '15 at 04:22
  • Yeah that's totally possible, but usually it's a lot of hassle if the file is not actually a CSV but something more complicated like an XML or json. Also I am just writing some research code so file names are hardcoded in a few places and I would have to make that more modular only for the purpose of profiling. Basically I want to avoid changing things as much as possible. – Pushpendre Dec 06 '15 at 04:27
  • Have you tried [*this method*](http://stackoverflow.com/a/4299378/23771)? – Mike Dunlavey Dec 06 '15 at 16:29

1 Answers1

6

Step 1: run your script myscript.py under the profiler for 3 minutes, outputting the profiling information to the file prof. On Linux and similar, you can do this with

timeout -s INT 3m python -m cProfile -o prof myscript.py

(Note: if you omit -s INT, SIGTERM is used instead of SIGINT, which seems to work of Python 2 but not on Python 3.) Alternatively, on any system, you should be able to do

python -m cProfile -o prof myscript.py

then press Ctrl-C at the end of 3 minutes.

Step 2: get some statistics from the prof file with something like

python -c "import pstats; pstats.Stats('prof').sort_stats('time').print_stats(20)"
JeremyR
  • 401
  • 4
  • 8