2

In my company currently, we use Jenkins 2.204.2 version. And we have a lot of slaves. One of the slaves which runs as a service in a Windows machine often disconnects. As a workaround solution we restart service and it connects again but this solution is really annoying. The output log when the agent is disconnected is like below

Connection was broken: java.nio.channels.ClosedChannelException at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:209) at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:221) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816) at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:784) at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:172) at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:314) at hudson.remoting.Channel.close(Channel.java:1450) at hudson.remoting.Channel.close(Channel.java:1403) at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:843) at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:108) at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:734) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

I'm looking for a permanent solution but I'm having trouble investigating the problem. Actually I don't know where I am supposed to look (don't know should I check jenkins logs or java version or something else). Could you please help me about this problem? Any help would be highly appreciated, thank you in advance.

The outputs of java -version command, on master which is CentOS 7 as following

java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot (TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

on slave which is Windows Server 2019

Picked up _JAVA_OPTIONS: -Xmx256M
java version "1.8.0_231"
Java(TM) SE Runtime Environment (build 1.8.0_231-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.231-b11, mixed mode)
happy-integer
  • 383
  • 1
  • 3
  • 15
  • I faced a similar issue in the past, and problem was linked to a regular reboot/patching corporate policy. Can you also print the output of what happens on the windows jenkins agent when this disconnection occurs ? Are those windows agents regularely rebooted / patched ? Check the uptime on windows agent machine after such disconnect. – MorganGeek Feb 25 '20 at 06:03
  • Also has similar issues as @MorganGeek as WIN systems were patched. Switched to launch via [DCOM](https://stackoverflow.com/a/56268806/598141) w/o issues (after security setting). – Ian W Feb 25 '20 at 09:13
  • @MorganGeek There is no regular reboot, this happens when server is online. In windows event logs there is an event that says "SIGINT to 9548 failed - Killing as fallback" – happy-integer Mar 02 '20 at 05:54
  • @IanW I tried to switch to launch via DCOM but I ended up with [the error](https://github.com/jenkinsci/windows-slaves-plugin/blob/master/docs/troubleshooting.adoc#remote-agent---windows-returned-error-code-0x8001ffff), I did the same thing as in your link described, also tried the solution in the github link they both didn't work.. – happy-integer Mar 02 '20 at 06:00

0 Answers0