0

We have a process that takes about 20 hours to run on our Linux box. We would like to make it faster, and as a first step need to identify bottlenecks. What is our best option to do so?

I am thinking of sampling the process's CPU, RAM, and disk usage every N seconds. So unless you have other suggestions, my specific questions would be:

  1. How much should N be?
  2. Which tool can provide accurate readings of these stats, with minimal interference or disruption from the fact that the tool itself is running?
  3. Any other tips, nuggets of wisdom, or references to other helpful documents would be appreciated, since this seems to be one of these tasks where you can make a lot of time-consuming mistakes and false-starts as a newbie.
Dun Peal
  • 16,679
  • 11
  • 33
  • 46
  • 1
    Give [*this*](http://stackoverflow.com/a/378024/23771) a shot. It's not about stats, it's about finding the time hogs. – Mike Dunlavey Feb 11 '15 at 21:40
  • If you don't know is the process CPU bound or I/O bound, you can start from simplest system monitoring tools, as recommended by Gregg: http://www.slideshare.net/brendangregg/linux-performance-analysis-and-tools Linux Performance Analysis and Tools (SCaLE11x,2013) - like `top` (CPU usage should be close to thread_count * 100% for CPU-bound task) and `iostat` to check disk activity. You can also check "Cpu(s)" line of top to check `%sy` - linux kernel cpu load. There are more tools listed, `sar`,`vmstat`,`mpstat`,`iostat` will show stats for every N seconds. Mike, 401k views is more than 242k. – osgx Mar 03 '15 at 07:29

2 Answers2

2

First of all, what you want and what you are asking is completely different.

Monitoring is required when you are running it for first time i.e. when you don't know its resource utilization (CPU, Memory,Disk etc.). You can follow below procedure to drill down the bottleneck,

  1. Monitor system resources (Generally 10-20 seconds interval should be fine with Munin, ganglia or other tool). In this you should be able to identify if your hw is bottleneck or not i.e are you running out of resources Ex. 100% cpu util, very low memory, high io etc.

If this your case then probably think about upgrading hw or tuning the existing.

  1. Then you tune your application/utility. Use profilers/loggers to find out which method, process is taking time. Try to tune that process. If you have single threaded codes then probably use parallelism. If DB etc. are involved try to tune your queries, DB params.

Then again run test with monitoring to drill down more :)

Nachiket Kate
  • 8,473
  • 2
  • 27
  • 45
0

I think a graph representation should be helpful for solving your problem and i advice you Munin.

It's a resource monitoring tool with a web interface. By default it monitors disk IO, memory, cpu, load average, network usage... It's light and easy to install. It's also easy to develop your own plugins and set alert thresholds.

http://munin-monitoring.org/

Here is an example of what you can get from Munin : http://demo.munin-monitoring.org/munin-monitoring.org/demo.munin-monitoring.org/

berthni
  • 192
  • 12