0

I'm getting an InterruptedException from Jenkins, relevant part of stack trace:

java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at hudson.remoting.Request.call(Request.java:127)
    at hudson.remoting.Channel.call(Channel.java:646)
    at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
    at $Proxy33.join(Unknown Source)
    at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:861)

That interrupt is unexpected and so far unexplained. I can't make that happen under debugger practically, it only happens in a CI which is in production use, and it happens fairly rarely, in well under 1% of Jenkins job executions. Combing through various logs hasn't yielded any useful hints of the cause so far. The remote Jenkins node did not seem to have disconnection at that time.

Question: How to find out the cause of that InterruptedException, or anything else potentially useful, with above constraints?

Any other ideas for tracking down cause of such an exception are also welcome! Perhaps something Jenkins/Hudson specific, not covered by this earlier question (answers of that aren't really helpful here).

Community
  • 1
  • 1
hyde
  • 60,639
  • 21
  • 115
  • 176
  • Your call to `Object::wait` should have be made when you're waiting for an condition to be true. If you're interrupted & the condition is still false, its a spurious wakeup. Go back to wait mode. http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#wait%28%29 – R Kaja Mohideen Feb 12 '13 at 08:40
  • @RKajaMohideen Yeah, except it's not my wait, so looks like it's bug reporting time. – hyde Feb 12 '13 at 09:59
  • Yes. If the library wait for some purpose. It must anticipate this case of getting interrupted and should handle to either to wait again or fail the method, not to throw the Exception to the Client. – R Kaja Mohideen Feb 12 '13 at 10:11

2 Answers2

3

The InterruptedException looks normal. Checking the Jenkins source code I see that it gets handled (they close resources in the catch block) and then rethrow. Out of the box I don't get it why they do that (waiting in the first place).

Looking at the comment before wait:

// I don't know exactly when this can happen, as pendingCalls are cleaned up by Channel,
// but in production I've observed that in rare occasion it can block forever, even after a channel
// is gone. So be defensive against that.
wait(30*1000);

I would say that somebody added the wait to overcome "rare occasion of blocking forever" and at the same time introduced a death by interrupt from the waiting.

Your best bet is to check the Jenkins issue tracker and file a report that your jobs are failing because waiting gets interrupted every now and then and it cancels the remote call. I think they should either go back to waiting if they want to spend that amount of time waiting or continue but not fail at that point.

toomasr
  • 4,731
  • 2
  • 33
  • 36
0

Unfortunately, it is not very well emphasized, but the best way to wait for a condition is by writing code as such:

while (condition <> true) {

try {
  wait(1000L);
  //do something
} 
catch (InterrruptedException e) {
}

}

You have to watch out for spurious interrupts, and code around those.

inder
  • 943
  • 11
  • 15
  • 1
    What's a "spurious interrupt"? I've heard about (platform-dependent) spurious wakeups, but never about a spurious interrupt. In turn, the consensus is that you should at the very least set the interrupted flag again (by calling Thread.currentThread().interrupt()) when you catch an InterruptedException, and under no circumstances should you swallow it the way you're proposing. – laszlok Aug 30 '17 at 15:03