Hudson supports the "master/slave" mode, where the workload of building projects are delegated to multiple "slave" nodes, allowing single Hudson installation to host a large number of projects. This document describes this mode and how to use it.
How does this work?
A "master" is an installation of Hudson. When you weren't using the master/slave support, a master was all you had. Even in the master/slave mode, the role of a master remains the same. It will serve all HTTP requests, and it can still build projects on its own.
Slaves are computers that are set up to build projects for a master. Hudson runs a separate program called "slave agent" on slaves.
When slaves are registered to a master, a master starts distributing loads to slaves. The exact delegation behavior depends on configuration of each project. Some projects may choose to "stick" to a particular machine for a build, while others may choose to roam freely between slaves. For people accessing Hudson website, things works mostly transparently. You can still browse javadoc, see test results, download build results from a master, without ever noticing that builds were done by slaves.
Follow the Step by step guide to set up master and slave machines to quickly start using distributed builds.
Requirement for master/slave support
To use the master/slave support, a master needs to be able to run a "slave agent" program on the slave. There are several ways to do this:
Have master launch slave agent
One way of doing this is to configure a master to launch a slave agent on the target machine. On Unix, this can be done by SSH, RSH, or other similar means. On Windows, this could be done by the same protocols through cygwin or tools like psexec.
The slave agent program is a simple Java program that can be launched like
java -jar slave.jar (note that running this manually in the command line will not work). A copy of
slave.jar can be found inside
WEB-INF. Therefore, a typical slave agent launch command would look something like
ssh myslave java -jar ~/bin/slave.jar.
You must run this command from the Hudson master
The Hudson master is using the standard output/input channels as communication means with the slave. This is why you must provide the command to start the slave in the Hudson interface. Running it manually will not work.
Launching the slave remotely requires an additional initial set up on slaves (especially on Windows, where remote login mechanism is not available out of box), but the benefits of this approach is that when the connection goes bad, you can use Hudson's web interface to re-establish the connection.
Technically speaking, in this set up you should update
slave.jar every time you upgrade Hudson to a new version. However, in practice
slave.jar changes infrequently enough that it's also practical not to update until you see a fatal problem in start-up. Clever folks have also set things up so that their slave launch script would automatically update slave.jar, too.
Launch slave agent via Java Web Start
Another way of doing this is to start a slave agent through Java Web Start (JNLP). In this approach, you'll interactively logon to the slave node, open a browser, and open the slave page. You'll be then presented with the JNLP launch icon. Upon clicking it, Java Web Start will kick in, and it launchs a slave agent on the computer where the browser was running.
This mode is convenient when the master cannot initiate a connection to slaves, such as when it runs outside a firewall while the rest of the slaves are in the firewall. OTOH, if the machine with a slave agent goes down, the master has no way of re-launching it on its own.
Java Web Start provides some means of automatically running the slave agent. For example, instead of manually clicking the icon, you can run the following command from CLI:
Launch slave agent headlessly
This third option uses a mechanism very similar to Java Web Start, except that it runs without using GUI, making it convenient for an execution as a daemon. To do this, take
slave.jar as discussed above, and run it like:
This approach is especially convenient to run a slave agent as a service on Windows, for those who don't want to run
sshd on Windows.
Also note that the slaves are a kind of a cluster, and operating a cluster (especially a large one or heterogeneous one) is always a non-trivial task. For example, you need to make sure that all slaves have JDKs, Ant, CVS, and/or any other tools you need for builds. You need to make sure that slaves are up and running, etc. Hudson is not a clustering middleware, and therefore it doesn't make this any easier.
Example: Configuration on Unix
This section describes my current set up of Hudson slaves that I use inside Sun for my day job. My master Hudson node is running on a SPARC Solaris box, and I have many SPARC Solaris slaves, Opteron Linux slaves, and a few Windows slaves.
- Each computer has an user called
hudsonand a group called
hudson. All computers use the same UID and GID. (If you have access to NIS, this can be done more easily.) This is not a Hudson requirement, but it makes the slave management easier.
- On each computer,
/var/hudsondirectory is set as the home directory of user
hudson. Again, this is not a hard requirement, but having the same directory layout makes things easier to maintain.
- All machines run SSHD. Windows slaves run cygwin sshd.
- All machines have ntp client installed, and synchronize clock regularly with the same NTP server.
/var/hudsonhave all the build tools beneath it --- a few versions of Ant, Maven, and JDKs. JDKs are native programs, so I have JDK copies for all the architectures I need. The directory structure looks like this:
/var/hudson/.sshhas private/public key and
authorized_keysso that a master can execute programs on slaves through ssh, by using public key authentication.
- On master, I have a little shell script that uses rsync to synchronize master's
/var/hudsonto slaves (except
/var/hudson/workspace) I use this to replicate tools on all slaves.
/var/hudson/bin/launch-slaveis a shell script that Hudson uses to execute jobs remotely. This shell script sets up
PATHand a few other things before launching
- Finally all computers have other standard build tools like
cvsinstalled and available in PATH.
Some slaves are faster, while others are slow. Some slaves are closer (network wise) to a master, others are far away. So doing a good build distribution is a challenge. Currently, Hudson employs the following strategy:
- If a project is configured to stick to one computer, that's always honored.
- Hudson tries to build a project on the same computer that it was previously built.
- Hudson tries to move long builds to slaves, because the amount of network interaction between a master and a slave tends to be logarithmic to the duration of a build (IOW, even if project A takes twice as long to build as project B, it won't require double network transfer.) So this strategy reduces the network overhead.
If you have interesting ideas (or better yet, implementations), please let me know.
Transition from master-only to master/slave
Typically, you start with a master-only installation and then much later you add slaves as your projects grow. When you enable the master/slave mode, Hudson automatically configures all your existing projects to stick to the master node. This is a precaution to avoid disturbing existing projects, since most likely you won't be able to configure slaves correctly without trial and error. After you configure slaves successfully, you need to individually configure projects to let them roam freely. This is tedious, but it allows you to work on one project at a time.
Projects that are newly created on master/slave-enabled Hudson will be by default configured to roam freely.
Master on public network, slaves within firewall
One might consider setting up the Hudson master on the public network (so that people can see it), while leaving the build slaves within the firewall (because having a lot of machines on the internet is expensive.) This can generally be made to work in two means:
- Allow port-forwarding from the master to your slaves within the firewall. The port-forwarding should be restricted so that only the master with known IP can connect to slaves.
- Use JNLP slaves and have slaves connect to the master.
Note that in both cases, once the master is compromised, all your slaves can be easily compromised (IOW, malicious master can execute arbitrary program on slaves), so both set-up leaves much to be desired in terms of isolating security breach. Build publisher plugin (which looks almost ready as of this writing) provides another way of doing this, in more secure fashion.
- Every time Hudson launches a program locally/remotely, it prints out the command line to the log file. So when a remote execution fails, login to the computer that runs the master by using the same user account, and try to run the command from your shell. You tend to solve problems quickly in this way.
- Each slave has a log page showing the communication between the master and the slave agent. This log often shows error reports.
- If you use binary-unsafe remoting mechanism like telnet to launch a slave, add the
slave.jarso that Hudson avoids sending binary data over the network.
- When the same command runs outside Hudson just fine, make sure you are testing it with the same user account as Hudson runs under. In particular, if you run Hudson master on Windows, consult How to get command prompt as the SYSTEM user.
- Feel free to send your trouble to