20

Here's my simple code to loop every second (doesn't need to be exact) and kick off a job if necessary:

while (true) {
  // check db for new jobs and 
  // kick off thread if necessary
  try {
    Thread.sleep(1000);
  } catch(Throwable t) {
    LOG.error("", t);
  }
}

This code has worked fine for several months. Just yesterday we started having problems where one of our servers seems to be hung in the Thread.sleep(1000) method. IOW - it's been over a day and the Thread.sleep hasn't returned. I started up jconsole and get this info about the thread.

Name: Thread-3
State: TIMED_WAITING
Total blocked: 2  Total waited: 2,820

Stack trace: 
 java.lang.Thread.sleep(Native Method)
xc.mst.scheduling.Scheduler.run(Scheduler.java:400)
java.lang.Thread.run(Thread.java:662)

Scheduler.java:400 is the Thread.sleep line above. The jconsole output doesn't increment "Total waited" every second as I'd expect. In fact it doesn't change at all. I even shut down jconsole and started it back up in the hopes that maybe that would force a refresh, but only got the same numbers again. I don't know what other explanation there could be besides that the jvm has incorrectly hung on the sleep command. In my years, though, I've had so few problems with the jvm that I assume it must be an oversight on my part.

note: The other thing to note is that no other thread is active. IOW - the cpu is nearly idle. I read somewhere that Thread.sleep could be legitimately starved if another thread was active, but that isn't the case here.

solaris version:

$ uname -a
SunOS xcmst 5.10 Generic_141415-08 i86pc i386 i86pc

java version:

$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) Server VM (build 20.1-b02, mixed mode)
Raedwald
  • 46,613
  • 43
  • 151
  • 237
andersonbd1
  • 5,266
  • 14
  • 44
  • 62
  • Do your logs show anything coming from the `LOG.error` statement? – Freiheit Jul 19 '11 at 14:51
  • 2
    Could it be a deadlock in the database code? – Bobby Jul 19 '11 at 14:52
  • @Freiheit - no, nothing in the logs – andersonbd1 Jul 19 '11 at 15:26
  • @Bobby - I don't think so. Even if there were, would that be a reason that Thread.sleep wouldn't wake up in this thread? – andersonbd1 Jul 19 '11 at 15:26
  • *I've had so few problems with the jvm that I assume it must be an oversight on my part.* consider yourself lucky :) – bestsss Jul 20 '11 at 10:07
  • @andersonbd1 Why are you catching `Throwable` in that try block? Is there more logic inside the try block that you're not showing us? –  Jul 25 '11 at 03:47
  • 2
    @andersonbd1: have you/can you consider running dtrace or strace to understand what leads to the freeze? Since you appear to have it isolated to a single server it may help you identify the point of failure on either hardware or software configuration. – philwb Jul 27 '11 at 16:16
  • I am facing similar issue but now with SchecduledThreadPoolExecutor on 64bit JVM on 64bit Linux server.http://stackoverflow.com/questions/9044423/java-scheduler-which-is-completely-independent-of-system-time-changes – YoK Jan 31 '12 at 13:44
  • Is your application running in virtual environment? I would recommend trying without VM. – avro Jul 28 '11 at 13:32

6 Answers6

7

In addition to what bdonlan mentioned you may want to look into ScheduledThreadPoolExecutor. I work on a very similar type of project and this object has made my life easier, thanks to this little snippet.

ScheduleAtFixedRate

If any execution of this task takes longer than its period, then subsequent executions may start late, but will not concurrently execute.

I hope this helps!

Shaded
  • 17,276
  • 8
  • 37
  • 62
  • 1
    +1 Good advice. With such a good support for higher level concurrency, one shouldn't use threading primitives (wait(), notify(), sleep(), synchronized keyword etc.). – helpermethod Jul 28 '11 at 08:46
  • 1
    Right, but those higher level classes are implemented in Java, apparently with no special native support, and also looking in JDK source codes it seems they end up calling Thread.wait, notify, sleep, yeld etc... So, if given a certain OS, configuration, hardware etc.. those primitives stop working, this higher level systems will stop working as well. – Simone Gianni Jul 30 '11 at 04:28
  • @SimoneGianni true, I'd expect the same issue if the issue is lying somewhere deeper in the system. However, it could also be caused by some kind of race condition, or a whole number of other hidden issues. My solution would reduce the chance of a race condition by assuring only 1 thread is executed at a time. – Shaded Aug 01 '11 at 12:25
  • @Shaded I am facing similar issue but now with SchecduledThreadPoolExecutor on 64bit JVM on 64bit Linux server.http://stackoverflow.com/questions/9044423/java-scheduler-which-is-completely-independent-of-system-time-changes – YoK Jan 31 '12 at 13:43
5

Are you depending on the system tick count to increase monotonically?

From what I've heard from someone experienced, it (occasionally) happens that the system tick goes backwards by one or two ticks. I haven't experienced it myself yet, but if you're depending on this, might this explain what's happening?

Edit:

When I said System.currentTimeMillis(), I believe I was mistaken. I thought that System.currentTimeMillis() is similar to Windows' GetTickCount() function (i.e. it is measures a time that is independent of the system time), but in fact, that does not seem to be the case. So of course it can change, but that was not my point: apparently, tick counts measured by the system timer can also go backwards by a tick or two, even ignoring system time changes. Not sure if that helps, but thanks to Raedwald for pointing out the system time change possibility, since that's not what I meant.

Community
  • 1
  • 1
user541686
  • 205,094
  • 128
  • 528
  • 886
  • That shouldn't happen on any modern OS - at least on Windows the OS makes sure that time is monotonically increasing (except for overflow obviously). That's usually (eg NTP) done by slowing down the ticks for some time to synchronize. – Voo Jul 19 '11 at 15:27
  • @Voo: Interesting, but how do you know Windows guarantees that? I didn't see anything in the page of `GetTickCount`. – user541686 Jul 19 '11 at 15:43
  • 2
    On a Unix system, using the `date` command you can set the clock backwards by an arbitrary amount. Perhaps someone changed the clock while your process was running? – Raedwald Jul 26 '11 at 13:07
3

I know that you looked in jconsole, but it might be useful to send signal 3 to the process (that is, kill -3) and post more of the resulting thread dump here. Or, if you really want to get into the details, then you might consider taking one or more pstack/jstack dumps of the hung process in quick succession in order to show where the threads really are. Information is available online about how to correlate this information with a java thread dump.

Also, by "one of our servers," are you saying that the problem is reproducible on one server, but it never occurs on other servers? This indicates a problem with that one server. Check that everything is the same across your servers and that there are no issues on that hardware in particular.

Finally, this might not be a java problem per se. Thread.sleep(long) is a native method (maps directly onto the underlying operating system's thread management), so check that your OS is up to date.

jtoberon
  • 8,706
  • 1
  • 35
  • 48
  • Thanks jtoberon - we've pretty much gone down all those paths. I confirmed where it was hung with the kill -3. Yes, it's only on one server, so we're trying some A/B tests to figure out what's different. – andersonbd1 Jul 26 '11 at 14:10
  • 2
    If it only happens one specific physical server, then there's almost certainly something different or wrong with that server. You'll run into server-specific problems every few months once you start running a java app on more than a few dozen severs. Also, personally I would try the suggestions in one of the other responses on this page -- i.e. do a workaround, rather than spending a lot of time debugging. – jtoberon Jul 26 '11 at 14:51
2

Have you considered using Timer & TimerTask.

Here is simple snippet which might help.

import java.util.Calendar;
import java.util.Timer;
import java.util.TimerTask;

public class Example {

    public static void main(String args[]) {
        Timer timer = new Timer();

        TimerTask task = new TimerTask() {
            @Override
            public void run() {
                Calendar instance = Calendar.getInstance();
                System.out.println("time: " + instance.getTime() + " : " + instance.getTimeInMillis());

                // check db for new jobs and
                // kick off thread if necessary
            }
        };

        int startingDelay = 0; // timer task will be started after startingDelay
        int period = 1000; // you are using it as sleeping time in your code
        timer.scheduleAtFixedRate(task, startingDelay, period);
    }

}

EDIT

According to the discussions I have studied, Thread.sleep() is the sign of poorly designed code. Reasons are

  • ...The thread does not lose ownership of any monitors (from documentation).
  • Blocks the thread from execution.
  • And obviously it does not give any guarantee, that execution will start after sleeping time.
  • To me, it is so much primitive to use Thread.sleep(). There is a whole package dedicated to concurrency.

Which one is better instead of Thread.sleep()? Which raises another question. I would suggest you to have a look in Concurrency chapter from the book Effective Java.

Kowser
  • 8,123
  • 7
  • 40
  • 63
  • I understand there are many different apis I could use to accomplish the same goal. However, I'm guessing most of these APIs depend on the same underlying C/system code. IF that's where the problem is, it won't make a lick of difference which higher level code I use. Do you know if the method you propose is any better than Thread.sleep or Object.wait? – andersonbd1 Jul 28 '11 at 12:14
  • You have been studying the wrong discussions. Sleep() can be misused, yes, but it's not always a sign of poorly designed code. – Martin James Jan 31 '13 at 11:36
1

Thread.sleep() is not a good practice in Java programming. Just Google "Is Thread.sleep() bad?" and you will see my point.

Firstly, it makes the current Thread inaccessible by other parts of the program especially if it is multi-threaded. Maybe that is why you are experiencing the hang.

Secondly, it would be catastrophic if the current thread is EDT (Event Dispatch Thread) and the application has Swing GUI.

A better alternative would be Object.wait() :

final Object LOCK = new Object();
final long SLEEP = 1000;

public void run() {
  while (true) {
    // check db for new jobs and 
    // kick off thread if necessary

    try {
      synchronize (LOCK) {
        LOCK.wait(SLEEP);
      }
    } catch (InterruptedException e) {
      // usually interrupted by other threads e.g. during program shutdown
      break;
    }

  }
}
Augustus Thoo
  • 492
  • 4
  • 13
  • 1
    I don't think I care about the firstly's and secondly's you mentioned, but it's worth a shot. According to this answer: http://stackoverflow.com/questions/708333/using-object-waitmillisec-to-simulate-sleep?answertab=votes#tab-top In ancient Java code you'd see people using wait() instead of Thread.sleep() because Thread.sleep() would freeze the whole application on systems without preemptive multitasking – andersonbd1 Jul 21 '11 at 20:29
  • ok - we tried using Object.sleep() and got the same result - it's hanging at the execution of that method now. – andersonbd1 Jul 25 '11 at 12:46
  • I would definitely consider refactoring this code to use the wait()/notify() synchronization pattern, as Augustus pointed out – Andrew Fielden Jul 27 '11 at 10:53
  • 4
    I don't see why you think sleep is "bad". Both sleep and wait(long) put the thread to a timed_wait condition. Underlying jvm mechanisms in both implementations commonly are to park the thread using virtually the same code. As for the concerns, I don't see why wait makes a thread any less "inaccessible" (I'm not sure what you mean by that) than sleep, nor do I see why tying up something like the EDT on a timed object wait is any different than tying it on a sleep. – philwb Jul 27 '11 at 13:45
  • 2
    I don't get this answer. How should a Thread be accessible by other threads? How should .wait() not hung where .sleep() does (which happens only on that specific server, is not such a common situation)? Why should I incur in the cost of synchronizing on another object, put my thread in a pool waiting for notification, and for what? – Simone Gianni Jul 27 '11 at 20:45
  • 2
    Moreover, while it is true that wait releases the synchronization lock while sleep does not .. it releases the lock ONLY on the object you invoke wait on, any other synch lock held by the current thread will NOT be released, causing the same possible harm of a Thread.sleep . – Simone Gianni Jul 27 '11 at 20:48
  • It's a syllogism: 'Sleep() pauses the calling thread. Sleep can be used for polled inter-thread comms. Polled inter-thread comms waste CPU and add latency, which is bad. Sleep() is therefore bad'. – Martin James Jan 31 '13 at 11:49
0

maybe you can try another tool other than Jconsole to first confirm that it is block in the sleep api.

For example, manually try using jstack to print it to file for many times and check the result.

Or use a better tool, such as Youkit (commercail) if your org has its license to profile the application in depth, or remote debug (maybe can not in production)

OR You can check whether the "// check db for new jobs " code is run during. by checking loggings, or profile, or any other method depends on your application........ If the check db is very quick, and then sleep 1 seconds, if is very likely that you always see sleep in stack trace just because the compared probability....

Ben Xu
  • 1,279
  • 4
  • 13
  • 26
  • I added some debugging to the code and was able to recreate the issue to confirm that indeed program execution halts at Thread.sleep. – andersonbd1 Jul 25 '11 at 12:46