5

I have an application which is currently stuck and I'm trying to understand why. In the kill -3 (thread dump) output, I see many threads waiting to lock an object (specifically - waiting on a synchronized method of Vector), but I don't see any thread holding that lock.

Any idea how can this be debugged?

Thanks

P.S. I know Vector is old and ArrayList is more recommended, but this is a legacy code which I'm trying to debug.

Pavel Bucek
  • 5,304
  • 27
  • 44
Zamir
  • 1,492
  • 1
  • 12
  • 20
  • What's the relation to `Vector` and `ArrayList` to this question? – Abimaran Kugathasan Feb 12 '14 at 08:53
  • Just explaining why we use Vector (which all its methods are synchronized), instead of ArrayList, which is usually more recommended. No other reason – Zamir Feb 12 '14 at 08:54
  • 1
    Vector isn't just old, it was replaced by ArrayList in 1998. How many people do you know why wrote Java before 1998? – Peter Lawrey Feb 12 '14 at 09:01
  • If you have a really old JVM, there was a bug where locks could either a) get into a bad state or b) fail to report the thread which was holding the lock. I would make sure your JVM is as current as you can make it, otherwise you are likely to be trying to debug a bug which was fixed many years ago. – Peter Lawrey Feb 12 '14 at 09:03
  • The JVM is not old - the exact version is 1.7.0.09-64 – Zamir Feb 12 '14 at 09:04
  • 2
    can you post thread dump and jvm details? – aryann Feb 12 '14 at 09:06
  • Check http://stackoverflow.com/questions/217113/deadlock-in-java and http://stackoverflow.com/questions/2757066/debugging-java-synchronization – Holger Feb 12 '14 at 09:18
  • the thread dump is huge, is there a way to attach a text file here? – Zamir Feb 12 '14 at 09:19
  • Can you post also the source code ? – Minh-Triet LÊ Feb 12 '14 at 10:05

2 Answers2

2

The situation you describe sounds like a classic case of deadlock.

You could (perhaps should) an IDE such as Eclipse or IntelliJ Idea, through which you could debug your application step-by-step and understand exactly where it stops and what to do.

Alternatively, pasting some code would help clarifying the situation as well as describing your environment in terms of JVM version etc.

Diferdin
  • 1,252
  • 1
  • 14
  • 30
  • +1 Obvious but very true. The simplest way to debug the code is using a debugger. – Peter Lawrey Feb 12 '14 at 09:05
  • 1
    Usually the thread dump explicitly writes if there is a deadlock... I can't really use an IDE, because I can't reproduce it again, or at least I don't know how to reproduce it at the moment. – Zamir Feb 12 '14 at 09:06
  • If you can't reproduce the issue, then the problem may be due to limited resources in the environment (e.g. CPU, memory, etc.) rather then logic in the code. – Diferdin Feb 12 '14 at 09:09
2

Possible causes

The following problems will usually show very similar symptoms (your program :

  • Deadlocks: Circular dependencies on a shared sressource
  • Live-Locks: Resource lock is handled around, but no progress is made (one step forward, one step back)
  • Resource starvation: The one doing the work does not get what he needs. All others seem busy but make no progress.
  • Heavy swapping: Progress is so slow, that the systems comes so a halt
  • Too many threads (OS overload): System is completely busy with managing resources, so there is no time to do real work

Diagnostic Tools

Debugger on developer machine

Deadlocks and similar problems are usually very hard to reproduce and hence cannot be easily analyzed with a debugger. The problem often occurs on the live system but not on a developer machine. The cause might be a different work load, a different data constellation or even different OS or different hardware (more CPU cores, NUMA architecture, etc.)

Remote debugger

You can try to attach to the production system using a remote debugger. You should only do this, when you can risk a complete halt of the production machine (e.g. because of a HVM crash!). You should only do this in a pair debugging session and discuss each step with a peer.

Logging and visualization

Use excessive logging and visualize the log data (R Studio, Mathematica, etc.). Be aware that the logging might change your system. Naive logging will change your live system according to performance and additional logging. Try asynchronous logging and test the performance impact of your logging before deploying it to the live system. Plan how you want to visualize your log data and what you would expect to see for the different possible causes described above. Otherwise you might miss that one log statement that will help you to show the root cause.

REPL

Query your live system by introducing a "command line" (REPL). By adding a command line to your live system, you can query it and change parts of it to diagnose the root cause. You can use the Clojure REPL, the Scala SBT REPL, Bean shell or add live changes using JRebel together with an external trigger to run the swapped code (WebService, scheduler, message queue, etc). Work in pairs (discuss each command before running it) and remember to protect the REPL against outside access (bind on localhost or on a Unix socket, use a named pipe, double check your firewall, authorize with a public key, log each access on a special log, etc.)!

JMX

Usually you can connect to a running Java VM using Java Management Extensions (JMX). Using JConsole or Java Visual VM, you can inspect the current stack trace for each thread and you can search for deadlocks. Additionally you can deploy own sensors in your application. Using DTrace (when your system supports this, e.g. Solaris, FreeBSD, NetBSD, Mac OS X), you can even monitor parts of the operating system.

You can add your own sensors, by providing MBeans or MXBeans (stricter typing, better compatibility).

Diagnostics

Deadlock

JConsole and VisualVM both have a function to find deadlocks and show the threads involved in the deadlock. Together with the function to show the current stack trace of each thread, diagnosing deadlocks becomes a breeze.

Live locks and resource starvation

When you add counters in your workers which get incremented when the worker gets the lock and when the worker has successfully made real progress, it becomes easy to find out, if your application makes progress or which workers are just juggling ressources arround without achieving anything.

You can query the counters using a remote debugger, JMX (if you add a sensor), the REPL or add according log messages. Using a REPL or by replacing code in the live system, you can introduce counters, log messages or JMX sensors when needed.

Swapping or OS overload

With JMX and DTrace you can analyze parts of your operating system. With a REPL you might be able to get OS and JVM statistics from the running process. With log statements or custom JMX sensors, you can monitor the performance metrics of your application.

It is crucial to measure the performance of your application when it runs fine, so you have some baseline values. Otherwise you won't be able to judge, if a measured value is fine or if it indicates a problem.

Community
  • 1
  • 1
stefan.schwetschke
  • 8,862
  • 1
  • 26
  • 30
  • 1
    Most of not all your answer is not for the question about the fact that no thread is holding the lock! – Aladdin May 28 '19 at 23:46