Profiling a multiprocess system

Question

I have a system that i need to profile.

It is comprised of tens of processes, mostly c++, some comprised of several threads, that communicate to the network and to one another though various system calls.

I know there are performance bottlenecks sometimes, but no one has put in the time/effort to check where they are: they may be in userspace code, inefficient use of syscalls, or something else.

What would be the best way to approach profiling a system like this? I have thought of the following strategy:

Manually logging the roundtrip times of various code sequences (for example processing an incoming packet or a cli command) and seeing which process takes the largest time. After that, profiling that process, fixing the problem and repeating.

This method seems sorta hacky and guess-worky. I dont like it.

How would you suggest to approach this problem? Are there tools that would help me out (multi-process profiler?)?

What im looking for is more of a strategy than just specific tools.

Should i profile every process separately and look for problems? if so how do i approach this?

Do i try and isolate the problematic processes and go from there? if so, how do i isolate them?

Are there other options?

On Linux, `perf` (aka `perftool`) can show you what time is spent where in a system which runs many processes. — Mats Petersson, Mar 07 '16 at 22:17
As i understand it, perf only tells me how much time is spent in system calls, no? — deller, Mar 07 '16 at 22:20
No, it will tell you how much CPU time was used by your code. It won't tell you if you are wasting time in `sleep` or waiting for network packets - I don't think there is a good tool for that when there are multiple processes involved [for a single process `strace` can help a bit] — Mats Petersson, Mar 07 '16 at 22:25
You can get `GCC` to output profiling information through compiler flags. — Galik, Mar 07 '16 at 22:26
Ive read about all sorts of profiling tools on google. What im looking for is more of a strategy than just specific tools. I want to know if i should profile every process separately and look for problems? try and isolate the problematic processes? if so, how do i isolate them? Are there other options? — deller, Mar 07 '16 at 22:31
[*This method*](http://stackoverflow.com/a/378024/23771), using GDB, works with multiple threads, and it works the same whether CPU- or syscall-bound. It's better if the code is built with optimization turned off. You can find the problems, fix them, and then turn optimization back on. GDB will give you a stack trace of all the threads, so look at them. Most of them should be at an idle state. One or two will be waiting for something significant, and the stacks will tell you why. — Mike Dunlavey, Mar 08 '16 at 01:08
@deller: Not sure about that, but if you have the process IDs it should be possible. You'd probably have to do each one separately. On the other hand there are unix utilities *pstack* and *lsstack* that should do more or less the same job. — Mike Dunlavey, Mar 08 '16 at 15:16

score 1 · Accepted Answer · answered Mar 07 '16 at 22:58

I don't think there is a single answer to this sort of question. And every type of issue has it's own problems and solutions.

Generally, the first step is to figure out WHERE in the big system is the time spent. Is it CPU-bound or I/O-bound?

If the problem is CPU-bound, a system-wide profiling tool can be useful to determine where in the system the time is spent - the next question is of course whether that time is actually necessary or not, and no automated tool can tell the difference between a badly written piece of code that does a million completely useless processing steps, and one that does a matrix multiplication with a million elements very efficiently - it takes the same amount of CPU-time to do both, but one isn't actually achieving anything. However, knowing which program takes most of the time in a multiprogram system can be a good starting point for figuring out IF that code is well written, or can be improved.

If the system is I/O bound, such as network or disk I/O, then there are tools for analysing disk and network traffic that can help. But again, expecting the tool to point out what packet response or disk access time you should expect is a different matter - if you contact google to search for "kerflerp", or if you contact your local webserver that is a meter away, will have a dramatic impact on the time for a reasonable response.

There are lots of other issues - running two pieces of code in parallel that uses LOTS of memory can cause both to run slower than if they are run in sequence - because the high memory usage causes swapping, or because the OS isn't able to use spare memory for caching file-I/O, for example.

On the other hand, two or more simple processes that use very little memory will benefit quite a lot from running in parallel on a multiprocessor system.

Adding logging to your applications such that you can see WHERE it is spending time is another method that works reasonably well. Particularly if you KNOW what the use-case is where it takes time.

If you have a use-case where you know "this should take no more than X seconds", running regular pre- or post-commit test to check that the code is behaving as expected, and no-one added a lot of code to slow it down would also be a useful thing.

Profiling a multiprocess system

1 Answers1