13

I have a long-running twisted server.

In a large system test, at one particular point several minutes into the test, when some clients enter a particular state and a particular outside event happens, then this server takes several minutes of 100% CPU and does its work very slowly. I'd like to know what it is doing.

How do you get a profile for a particular span of time in a long-running server?

I could easily send the server start and stop messages via HTTP if there was a way to enable or inject the profiler at runtime?

Given the choice, I'd like stack-based/call-graph profiling but even leaf sampling might give insight.

Community
  • 1
  • 1
Will
  • 73,905
  • 40
  • 169
  • 246

4 Answers4

13

yappi profiler can be started and stopped at runtime.

Community
  • 1
  • 1
Mikhail Korobov
  • 21,908
  • 8
  • 73
  • 65
  • 2
    Now it is [here](https://bitbucket.org/sumerc/yappi/) and a brief usage example is [here](https://github.com/mottosso/yappi), it can be used also from the command like with something like `python3 -m yappi -o myprofile.cprof script.py` – Jacopofar Nov 07 '18 at 17:13
9

There are two interesting tools that came up that try to solve that specific problem, where you might not necessarily have instrumented profiling in your code in advance but want to profile production code in a pinch.

  • pyflame will attach to an existing process using the ptrace(2) syscall and create "flame graphs" of the process. It's written in Python.

  • py-spy works by reading the process memory instead and figuring out the Python call stack. It also provides a flame graph but also a "top-like" interface to show which function is taking the most time. It's written in Rust and Python.

anarcat
  • 5,605
  • 4
  • 32
  • 38
3

Pyliveupdate is a tool designed for the purpose: profiling long running programs without restarting them. It allows you to dynamically selecting specific functions to profiling or stop profiling without instrument your code ahead of time -- it dynamically instrument code to do profiling.

Pyliveupdate have three key features:

  • Profile specific Python functions' (by function names or module names) call time.
  • Add / remove profilings without restart programs.
  • Show profiling results with call summary and flamegraphs.

Check out a demo here: https://asciinema.org/a/304465.

0xCC
  • 1,261
  • 1
  • 10
  • 8
3

Not a very Pythonic answer, but maybe straceing the process gives some insight (assuming you are on a Linux or similar).

Using strictly Python, for such things I'm using tracing all calls, storing their results in a ringbuffer and use a signal (maybe you could do that via your HTTP message) to dump that ringbuffer. Of course, tracing slows down everything, but in your scenario you could switch on the tracing by an HTTP message as well, so it will only be enabled when your trouble is active as well.

Alfe
  • 56,346
  • 20
  • 107
  • 159
  • Can you do these things to a python process that's running, without stopping it? – Li-aung Yip Mar 22 '12 at 09:30
  • 1
    Sure. It has to be prepared for it, though. In Python just call `sys.settrace(bla)` and the function `bla()` will be called for practically anything happening (calling functions, executing a line etc.). Debuggers and profilers typically rely on that mechanism. But it is rather simple to build something on that and then prepare to switch that on when receiving a special HTTP message. – Alfe Mar 22 '12 at 09:41