I'm trying to profile Nutch using VisualVM. Lucene is the part of the Nutch core responsible for generating url indexes and for searching these indexes due to some query. I'm running Nutch through Apache Tomcat and I would like to determine how much time Nutch spends in various function calls (including Lucene calls) but when I try to profile using VisualVM I get a bunch of profiling data about Tomcat and not Nutch or Lucene. What am I doing wrong here?
Asked
Active
Viewed 437 times
1
-
What do you mean you only get data about Tomcat? Since tomcat's the servelet, you shouldn't expect to see nutch or lucene run in their own processes, right? – Xodarap Nov 08 '10 at 18:31
-
That's true, what I'm looking for is when does the servlet use Nutch functions. – Dan Snyder Nov 09 '10 at 13:33
1 Answers
0
I had the same experience trying to locate Lucene time inside Tomcat calls. What you have to do is:
- Use VisualVM 1.2.2.
- Choose the relevant process and press "Profile".
- Check the "Settings" checkbox. This should open a "CPU settings" tab, with fields you can fill.
- Under "Start profiling From classes:" write an entrance point in your code (e.g. com.my.company.NutchUser)
- Uncheck "Profile new runnables".
- Choose "Profile only classes:" and under it write: org.apache.lucene.* org.apache.nutch.*
- Press the "Profile CPU" button. I believe if you do all that, then run your process and take occasional snapshots, you will be fine.
Alternatively, This guy suggests doing stack sampling instead of profiling. I have never done it, but it sounds interesting.
-
This is extremely useful. The only question I have is about step 4. What do you mean? Where would I add this? – Dan Snyder Dec 01 '10 at 18:51
-
See my edited step 3. You should see the screen change once you check the "Settings" checkbox. – Yuval F Dec 02 '10 at 06:37