Jenkins remoting may fail to maintain connection between master and slave, and in such case can report a cryptic stacktrace in build console / jenkins log. This page gives some guidance on collecting adequate information to help diagnose the problem.

java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException

Disable ping thread and get to the root cause

One of the common cause of the connection loss is the forced connection shutdown by the Ping Thread. When a ping thread detects that it's not getting a reply back in 4 minutes, it proceeds to terminate the connection to prevent an infinite hang. This will leave a message like the following in the slave launch log:

Ping failed. Terminating
Nov 01, 2014 1:22:35 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel.

If you've identified that this is how your connection is lost, then you now need to dig deeper and understand what's causing the ping reply to be so late. Doing this diagnosis usually requires that you disable the ping thread, so that you can cause a slave to hang and exhibit the problem without getting killed. Then use tools like jmap and jstack to obtain the diagnostic information of the slave JVM. See here and here for how to use those tools.

If you see that one side is hanging on write to the network while the other side is hanging on read from the network, it indicates a network communication problem. For example, if you remove the network cable to the slave, TCP packets will fail to get there in timely fashion, and ping timeout happens before TCP times out.

Pay particular attention to what SynchronousCommandTransport$ReaderThread.run is doing, as this is the thread that's responsible for reading remoting requests that arrive from the network and put the tasks into other threads' queue. If this thread is blocking on anything other than reading from the network, then it should be filed as a bug.

SSH slaves

SSH Slaves plugin v1.9 contains improvements to help diagnose the connection loss problem. Users who are seeing a problem is encouraged to upgrade to this version.

To better understand the failure mode of an SSH slave connection, it is helpful to learn the layering of protocols in SSH.

When a connection to an SSH slave is lost, the first thing you need to figure out is which layerthe failure originated from. The following three are the major sources:

ERROR: Connection terminated
java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2298)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2767)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:798)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:40)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
ERROR: Socket connection to SSH server was lost
java.io.IOException: Cannot read full block, EOF reached.
	at com.trilead.ssh2.crypto.cipher.CipherInputStream.getBlock(CipherInputStream.java:81)
	at com.trilead.ssh2.crypto.cipher.CipherInputStream.read(CipherInputStream.java:108)
	at com.trilead.ssh2.transport.TransportConnection.receiveMessage(TransportConnection.java:232)
	at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:685)
	at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:481)
	at java.lang.Thread.run(Thread.java:722)
Slave JVM has not reported exit code before the socket was lost
[11/04/14 19:00:50] [SSH] Connection closed.
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fbae76a861b, pid=5998, tid=140440723117824
#
# JRE version: 6.0_33-b33
# Java VM: OpenJDK 64-Bit Server VM (23.25-b01 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea6 1.13.5
# Distribution: Ubuntu 12.04 LTS, package 6b33-1.13.5-1ubuntu0.12.04
# Problematic frame:
# C  [jna8661775961556997745.tmp+0xa61b]  Java_com_sun_jna_Native_setInt+0x9b
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/slave1/hs_err_pid5998.log
#
# If you would like to submit a bug report, please include
# instructions how to reproduce the bug and visit:
#   https://bugs.launchpad.net/ubuntu/+source/openjdk-6/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
ERROR: Connection terminated
java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2298)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2767)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:798)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:40)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Slave JVM has terminated. Exit signal=ABRT
[11/04/14 18:51:53] [SSH] Connection closed.

The following example shows "kill -9" to the slave process. Note the "Exit signal=KILL" in the message:

Slave successfully connected and online
ERROR: Connection terminated
java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2298)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2767)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:798)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:40)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Slave JVM has terminated. Exit signal=KILL
[11/04/14 18:52:52] [SSH] Connection closed.

There was a bug in remoting that prevented stdout from the slave process from getting captured into the master. Kohsuke hasn't determined exactly when this regression was introduced, but he thinks it's around remoting 2.38 (which appeared in Jenkins 1.560.) You can tell whether your connection is affected by this bug by looking for the message "Evacuated stdout" in slave launch log. If you do not see this message, you are affected. This problem was resolved in remoting 2.48 (which will appear in Jenkins 1.590.) If you are running affected versions of Jenkins, you will not see the hotspot crash dump message.

Custom launcher

If you use the option "Launch agent via execution of command on the master", then be aware that the input and output to this command will be used for the communication between the central server and the remote slaves. Do not pollute the output stream with additional output and do not consume anything from the input stream.

Usually, the launcher command will be a shell script on the server that does some preparation and then uses ssh to execute slave.jar on the remote slave.

In the preparation, avoid additional ssh invocations to perform preparations on the remote slave, because ssh will consume bytes from the input stream, even if the remote command does not. See this stackoverflow answer for details and workarounds. If the problem you are seeing is java.io.StreamCorruptedException: invalid stream header: 0BDAACED (or another number), this is most likely your problem.