How to profile a continuously running server running on FreeBSD

Question

Possible Duplicate:
Saving gmon.out before killing a process

I'm trying to profile a server (source code available to me. c-code) on Linux environment. This server runs continuously like a web server. I'm trying to use gprof to profile the server. If the server exits by itself, gmon.out file is generated. I can use gprof and gmon.out to understand profiled data. Now the problem I have is, this server is running continuously, waiting for incoming socket connections, requests etc. If I kill this server, gmon.out is not generated. At this point I see the following options.

change the source code to profile itself and log this information after receiving SIGKILL signal. This is by far the ugliest solution and may introduce noise in the measurement.
Maybe there is a way to profile this server using gprof while the server is still running.
Other tools to try?

EDIT: The server is multi-process server. running on FreeBSD 7.2

I'm sure, people have solved these kind of problems before. I failed to find useful information on SO or outside.

I appreciate any thoughts/solutions people have.

Thanks a bunch.

UPDATE 1:

gprof doesn't seem to work with multi-process server. When I manage to get gmon.out after executing my server, only parent process is instrumented which actually doesnt do real work!.
oProfile doesn't support FreeBSD which is what my server is running on. For various reasons I can't(not allowed to) change OS.
Valgrind website doesnt have a port for FreeBSD. But there are some references to a port to FreeBSD. I failed to find FreeBSD port source.

Somehow I managed to get ports for valgrind. When I run make I get the following errors.

=> valgrind-stable-352.tar.gz doesn't seem to exist in /usr/obj/ports/distfiles/.
=> Attempting to fetch from ftp://ftp.FreeBSD.org/pub/FreeBSD/ports/distfiles/.
fetch: ftp://ftp.FreeBSD.org/pub/FreeBSD/ports/distfiles/valgrind-stable-352.tar.gz: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from http://www.rabson.org/.
fetch: http://www.rabson.org/valgrind-stable-352.tar.gz: No address record
=> Couldn't fetch it - please try to retrieve this
=> port manually into /usr/obj/ports/distfiles/ and try again.
*** Error code 1

I tried to find valgrind-stable-352.tar.gz on web. All of the links I found are dead.

I got pstack installed on my freebsd and the realised pstack gives only stack trace. reference : http://sourceforge.net/projects/bsd-pstack/
My understanding is that systemtap is only for kernel-space events, instrumentation etc.

I could be wrong or have insufficient information. Please correct me and give your thoughts. I really appreciate your assistance.

UPDATE 2: I think it will be helpful to give some details about the server that I'm trying to profile.

it is multi-server program. I/O bound, to be specific mysql database.
No threads involved. Each child-server-process handles only one request. configurable number of processes are created when the server starts.
I would like to find time spent in each function and its frequency. function codes are a mix of CPU-bound and IO bound (I believe more IO).
it is running on FreeBSD 7.2
written in c. number of reads is much greater than writes to the database via this server.

I managed to profile using gprof. But the generated gmon.out doesn't have per-process information. Does gprof work with multi-process servers? — Srikanth, Mar 07 '11 at 21:27
I'm glad you got pstack working, though lsstack is suppsed to show you symbols. Maybe you didn't understand how to use it. Basically you manually take several samples, like 10 or 20. Then, any bottleneck function, or function call, costing X% of time will appear on X% of samples (roughly), so look for those. It will show them to you whether threads are CPU or IO bound. — Mike Dunlavey, Mar 10 '11 at 18:02
I think that gives the rough estimate of frequency of function calls. I added UPDATE 2 to give more information about my server. Please have a look at it. — Srikanth, Mar 10 '11 at 19:13
It does not tell you how many times a function is called, and "time spent in each function" is very ambiguous. What it does tell you is where the bottlenecks are. Example: if 30% of time it is in state: main calls A calls B calls C calls D calls socket-wait. Any one of those call instructions (not just functions), *if you could remove it*, would make the whole program faster by a ratio of 10/7. So if you're looking for bottlenecks, that will find them. If you just want measurements, it either won't give them to you or they will be very approximate. — Mike Dunlavey, Mar 10 '11 at 19:51
What is the purpose of this exercise? Load testing? Performance tuning? Fixing a bug? For example, if you want to do performance tuning a good start is to see how long your SQL queries run. — Derick Schoonbee, Mar 14 '11 at 21:09
@Derick: Understanding bottlenecks, Improving throughput. I figured that sql queries are taking most of the time. But When I fix sql queries, something else will surface as bottleneck. At this point server is IO bound. I'm trying to transform IO bound to CPU bound to reduce latency. — Srikanth, Mar 15 '11 at 00:17
Ok. PS: In my experience with large CRM implementations if you fix slow queries (db tuninig or sql changes) most of your problems goes away. Afterwards only do you tackle the "complex" stuff (specific bottlenecks) and try to change code. I was just concerned that you go "too technical" from the beginning. — Derick Schoonbee, Mar 15 '11 at 08:57
@derick: I see your concern. my goal is to increase the throughput by >100 times. currently each requests takes 100ms. I want to reduce that to <10ms. It seems possibel because requests are spending >90% of code in db, executing several queries not just one slow query. I'm going to do some intelligent application-specific caching and reduce load on db. I'm thinking about how to squeeze after reducing time spent in db. I have to present a model, possible approaches that I'm going to undertake to achieve aforementioned goal. — Srikanth, Mar 16 '11 at 18:32
Sometimes people think bottlenecks will "move around". Rather there are multiple of them over a range of sizes, and finding/removing a big one just exposes the next biggest. I suspect @Derick is right about the biggest being DB, but there's no need to guess, as in [this example](http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773). If I needed to present a model, it would be to do that. If anyone asked me to guess where to look for problems, I would respectfully tell them that's not how to do it. Just sample & find them. — Mike Dunlavey, Mar 17 '11 at 16:25
The question was posted two weeks ago. With stack-sampling you could have found and fixed a series of bottlenecks, over a span of a few days, and be done with it. I would encourage you to do that now. You will either reach your goal of 100x or you will know why it's not possible. — Mike Dunlavey, Mar 17 '11 at 16:59

score 5 · Answer 1 · answered Mar 03 '11 at 21:24

5

While you certainly should take your precations on profiling critical production systems, use oprofile or/and systemtap , They're likely included with your distro already.

answered Mar 03 '11 at 21:24

nos

223,662
58
417
506

systemtap documentation says "SystemTap provides the infrastructure to monitor the running Linux kernel for detailed analysis". I'm looking for a specific server alone not the environment. reading more about oprofile... – Srikanth Mar 03 '11 at 21:56
oprofile can do exactly what you need. It profiles the whole system but you're free to analyze only the data about one particular process. – R.. GitHub STOP HELPING ICE Mar 17 '11 at 02:39
I agree. I tried to install oProfile only ro realize that it doesn't support FreeBSD OS :-( – Srikanth Mar 17 '11 at 19:39

score 3 · Answer 2 · answered Mar 03 '11 at 22:35

3

Even if you get gprof to serve you, there are problems.

It is blind to any system calls or I/O. It is based on the assumption that you will never do an unnecessary hang. It only looks at CPU-bound issues.
If there is any recursion, it just can't handle it.
The times it gives you are based on shaky assumptions, such as that every call to a routine takes about the same amount of time. It gives you no line-level information.

Measuring is one thing, but if you want to find "bottlenecks" that are doing unnecessary things, whether CPU or I/O, a very rough but effective tool is lsstack (which I think is on SourceForge).

Also, take a look at Zoom. It is a wall-time stack-sampler for Linux. It gives line-level percents, and I believe it can be attached and detached from a running process.

answered Mar 03 '11 at 22:35

Mike Dunlavey

40,059
14
91
135

Thanks for the information. At this point I can live with gprof problems. I want just time spent in each function call even if its not 100% accurate (need to be closer to that though). There are some I/O calls that I believe can be eliminated from the code. So I wanted to know how much I can save by eliminating them. Unfortunately, Zoom is not open source. I'll take a look at lsstack. Thanks again – Srikanth Mar 03 '11 at 23:14
@Srikanth: My bet is you will not be happy with gprof, even if you get it to work. For a whole lot of reasons [this technique](http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024) works, but if you're not able to use it, as I guess you're not, `lsstack` is a way to do basically the same thing, but less intrusively. You could also use `pstack`, but `lsstack` is more friendly. – Mike Dunlavey Mar 04 '11 at 00:17

score 3 · Answer 3 · answered Mar 10 '11 at 19:01

3

You could just use PmcTools - FreeBSD's oProfile-like alternative.

answered Mar 10 '11 at 19:01

Erik

88,732
13
198
189

nmichaels · Answer 4 · 2011-03-04T13:57:44.660

1

You can override the SIGTERM handler to call exit(0) which will cause gprof to generate the usual gmon.out.

edited Mar 04 '11 at 13:57

answered Mar 03 '11 at 21:20

nmichaels

49,466
12
107
135

You can't override SIGKILL. SIGKILL is not catchable. – derobert Mar 03 '11 at 22:44
I think what @nmichaels meant is to catch SIGTERM, which is generated when kill command is used by default. – Srikanth Mar 03 '11 at 23:40
For some reason, gmon.out is not getting generated when I kill all the child server processes. I'm handling SIGTERM and calling exit(0). All of the processes are identical and all have signal handlers setup appropriately. I thought it would work. Any ideas? – Srikanth Mar 07 '11 at 03:24
1

I had to change child process directory to avoid multiple processes overwriting gmon.out. It actually generated different gmon.out files. the following link explains how to handle when fork is involved. http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.cmds/doc/aixcmds2/gprof.htm – Srikanth Mar 11 '11 at 04:46

score 0 · Answer 5 · answered Mar 10 '11 at 19:42

0

Extend your server by a method (a command sent through a socket perhaps) to quit it smoothly and there you have your gmon.out. Or am I missing something and it's totally not possible to let it exit without killing it?

answered Mar 10 '11 at 19:42

Ronny Brendel

4,777
5
35
55

I tried similar approach. Handling SIGTERM and exiting smoothly. This captured only parent process activity which actually does not do real work. – Srikanth Mar 10 '11 at 21:55

score 0 · Answer 6 · answered Mar 15 '11 at 18:38

If you're able to try a fedora/rhel linux box for development testing, systemtap there should give you good visibility into your server processes. For example, if you wish to sample active functions in userspace programs, something relatively simple like this may help:

# stap -e 'global fns; probe timer.profile {if (user_mode()) fns[usymdata(uaddr())] <<< 1 }' -d /bin/yourserver -d /lib/yourlibrary.so -d /lib/yourotherlibrary.so

^C when you're done. A report may look like

fns["memset /lib64/libc-2.12.so+0xa7d/0xb20"] @count=0x56 @min=0x1 @max=0x1 @sum=0x56 @avg=0x1

fns["memset /lib64/libc-2.12.so+0x560/0xb20"] @count=0x12 @min=0x1 @max=0x1 @sum=0x12 @avg=0x1

fns["__GI_strlen /lib64/libc-2.12.so+0x0/0x50"] @count=0x4 @min=0x1 @max=0x1 @sum=0x4 @avg=0x1

fns["gobble_file /bin/ls+0x729/0xc70"] @count=0x1 @min=0x1 @max=0x1 @sum=0x1 @avg=0x1

fns["getuser /bin/ls+0x1c/0xa0"] @count=0x1 @min=0x1 @max=0x1 @sum=0x1 @avg=0x1

fns["getuser /bin/ls+0x23/0xa0"] @count=0x1 @min=0x1 @max=0x1 @sum=0x1 @avg=0x1

score 0 · Answer 7 · answered Mar 16 '11 at 14:31

0

You might want to look at Dyninst : http://www.dyninst.org/

It is a ptrace()-based API for dynamically adding and removing instrumentation to running code. You can use it for debugging, profiling, etc.

Good luck.

answered Mar 16 '11 at 14:31

capveg

107
8

score 0 · Answer 8 · answered Mar 17 '11 at 01:02

0

I'm not too much into the matter, but can't DTrace be used to do this?

FreeBSD just improved support for that. http://wiki.freebsd.org/DTrace/userland

answered Mar 17 '11 at 01:02

sleeplessnerd

21,853
1
25
29

score 0 · Answer 9 · answered Mar 18 '11 at 03:44

0

This might be a case for PMP

answered Mar 18 '11 at 03:44

ggiroux

6,544
1
22
23

How to profile a continuously running server running on FreeBSD

9 Answers9