How to profile a distributed app in java?

Question

I've got an app running on a grid of uniform java processes (potentially on different physical machines). I'd like to collect cpu usage statistics from a single run of this app. I've went over profiling tools looking for an option of automatic collection of data but failed to find any in netbeans, tptp, jvisualvm, yourkit etc.

Maybe I'm looking in a wrong way?

What I was thinking is:

run the processes on the grid with some special setup that allows them to dump profiling info
run my app as usual - it will push tasks to the grid, the processes will execute the tasks and publish profiling info
uses some tool to collect and analyze the profiling results

but I can't find anything even remotely similar to this.

Any thoughts, experience, suggestions?

Thank you!

"Collect" is a bit vague. For each method A run on multiple nodes, you want the CPU of method A accumulated across all the nodes? Or you want to see times for A(N1), A(N2), A(Nk) for each node 1..k? You want some kind of call graph for each node? Some union call graph for all nodes (defined how)? — Ira Baxter, Apr 28 '11 at 23:28
Fact is I'm not sure about the methodology. Actually my grid is sort of heterogeneous which adds to the problem. But I would expect that people who work with stuff like map-reduce face the same problems? My goal is to understand how the say 100 computational hours that take 5 minutes to complete on the grid are distributed between calls that actually happen on the individual grid nodes. — atamur, Apr 29 '11 at 05:12
Are these nodes all running the same code, and you are doing something data-parallel? Or are the computations heteroogeneous? — Ira Baxter, Apr 29 '11 at 07:19
All the nodes are running the same code, so computationally speaking the grid is homogeneous. — atamur, Apr 29 '11 at 08:32

score 3 · Answer 1 · edited May 23 '17 at 10:30

3

If you have allowed remote JMX access and if you are using SUN JDK 1.6 then try using jvisualvm. It has the option of remote JMX connection. Though I haven't it used for profiling CPU in a distributed environment.

Note: For CPU profiling your application should be running on SUN JDK 1.6 or above.

enter image description here

Have a look at these links:

edited May 23 '17 at 10:30

Community

1
1

answered Apr 28 '11 at 05:40

Favonius

13,959
3
55
95

As I mentioned in my qn JVVM doesn't really help, because I can't join the results over the grid into 1 large result for analysis. – atamur Apr 28 '11 at 12:36
@atamur: Its really stupid of me, somehow skipped it. Sorry for that. – Favonius Apr 28 '11 at 13:38
1

JVisualVM has quite a nice open architecture for building plugins, so you could quite easily put the results from multiple CPUs into one view. One problem will be correlation, though, especially over a widely geographically-spread grid. Also, presumably, you'd want to look at things like the standard CPU profile against time after a job arrives on a grid node. That might be quite tricky to do. – kittylyst Apr 28 '11 at 16:01

score 1 · Answer 2 · answered Apr 22 '11 at 13:01

1

if you look at something like zabbix (though there are tons others of monitoring tools), this allows for gathering data via JMX from a Java app. And if you enable JMX in your app and allow it to be queried externally (via TCP/IP) you will have access to a lot of the hotspot internals (free memory etc) also thread stacks etc. Then you could have these values graphed as well. It does need configuration but what you're looking for don't think can be done with a one line of a script.

answered Apr 22 '11 at 13:01

Liv

6,006
1
22
29

in fact if you allow remote JMX access you can use even JConsole to monitor each process. – Liv Apr 22 '11 at 13:01
i have full access, no firewalls, jmx is renabled, i can run jstatd etc. Doesn't really help to profile *CPU* though =( – atamur Apr 22 '11 at 17:19
profiling CPU can't be done in jconsole -- it is os specific. – Liv Apr 25 '11 at 19:35

score 1 · Answer 3 · answered May 02 '11 at 23:16

Just to add that profiling information on each node usually contain timestamps.

To match these timestamps all machines should have exactly the same time (10 millis delta maximum)

cluster nodes should synchronize with single source network time server (NTP)

score 1 · Answer 4 · answered May 03 '11 at 14:50

You can use some JMX library, e.g. jmxterm and wrap it in some code to connect to multiple hosts an poll them for changes. If you are abit familiar with Python, look at mys simple script here for some inspiration: http://rostislav-matl.blogspot.com/2011/02/monitoring-tomcat-with-jmxterm.html .

score 1 · Accepted Answer · answered May 03 '11 at 22:23

I have used CA Introscope for this type of monitoring. It uses Instrumentation to collect metrics over time. As an example, it can be configured to provide you a view of all nodes and their performance over time. From that node view, you can drill down to the method level to help you figure out where your bottle necks are.

Yes, it will provide CPU utilization.

It's a commercial $$$ tool, but its a great tool for collecting, monitoring and interrogating performance data.

score 0 · Answer 6 · answered Apr 29 '11 at 06:06

0

http://www.hyperic.com/products/open-source-systems-monitoring

I never tried other tools mentioned in other answers. I was more than satisfied with hyperic. It exposes webservices API as well which you can use to write your own analysis tools.

answered Apr 29 '11 at 06:06

Adisesha

5,200
1
32
43

quickly scannning their site i wasn't able to find any info on detailed cpu profiling (which i'm looking for) – atamur Apr 30 '11 at 21:51
If you are looking for which block of code is taking more time, then you need to use profiling tools you mentioned. Hyperic gives you percentage of cpu, memory etc used on a particular node. I do not know if there are any tools which can aggregate information from different nodes and give you detailed information. This applies to code profiling tools as well. – Adisesha May 02 '11 at 07:54

score 0 · Answer 7 · answered May 01 '11 at 10:40

If you know the critical paths you want to analyse I would suggest time stamping your process in key places and combining the logs yourself. This is likely to be a useful addition to your profiling, can be used in production and may be even more useful as a result. (It is for my project)

I have used YourKit to monitor a number of processes at once. It can show you what is happening in each in real time and collect the results when all is finished.

I don't know if it provides a combined view of what is happening.

I already have such mechanism at place, but it becomes a real pain maintaining it as new features/code paths are added — atamur, May 03 '11 at 05:34

score 0 · Answer 8 · answered May 03 '11 at 18:33

I was looking for something similar and found Hyperic

Claims are the tool can monitor most common applications ans systems, gather all information and present them in a conveniant fashion.

To be honest this is on my todo list, so I can't say if it will do the job or not. Anyway, it seem impressive.

How to profile a distributed app in java?

8 Answers8