Hudson supports the "master/slave" mode, where the workload of building projects are delegated to multiple "slave" nodes, allowing single Hudson installation to host a large number of projects, or provide different environments needed for builds/tests. This document describes this mode and how to use it.
It is pretty common when starting with Jenkins to have a single server which runs the master and all builds, however Jenkins architecture is fundamentally "Master+Agent". The master is designed to do co-ordination and provide the GUI and API endpoints, and the Agents are designed to perform the work. The reason being that workloads are often best "farmed out" to distributed servers. This may be for scale, or to provide different tools, or build on different target platforms. Another common reason for remote agents is to enact deployments into secured environments (without the master having direct access).
Many people today use Jenkins in cloud environments, and there are plugins and extensions to support the various environments and clouds. These may involve Virtual Machines, Docker Containers, Kubernetes (for example see Jenkins-X), EC2, Azure, Google Cloud, VMWare and more. In these cases the agents are managed for you typically (and in many cases on demand, as needed), so you may not need to read the content of this document for those cases.
This document describes this distributed mode of Jenkins and some of the ways in which you can configure it, should you need to take control (or maybe you are curious)
|Table of Contents|
How does this work?
A "master" operating by itself is an installation of Hudson. When you weren't using the master/slave support, a master was all you had. Even in the master/slave mode, the role of a master remains the samethe basic installation of Jenkins and in this configuration the master handles all tasks for your build system. In most cases installing an agent doesn't change the behavior of the master. It will serve all HTTP requests, and it can still build projects on its own.
Slaves are computers that are set up to build projects for a master. Hudson runs a separate program called "slave agent" on slaves. There are various ways to start slave agents, but in the end a slave agent and Hudson master needs to establish a bi-directional byte stream (for example a TCP/IP socket.)
When slaves are registered to a master, a master starts distributing loads to slavesOnce you install a few agents you might find yourself removing the executors on the master in order to free up master resources (allowing it to concentrate resources on managing your build environment) but this is not a necessary step. If you start to use Jenkins a lot with just a master you will most likely find that you will run out of resources (memory, CPU, etc.). At this point you can either upgrade your master or you can setup agents to pick up the load. As mentioned above you might also need several different environments to test your builds. In this case using an agent to represent each of your required environments is almost a must.
An agent is a computer that is set up to offload build projects from the master and once setup this distribution of tasks is fairly automatic. The exact delegation behavior depends on the configuration of each project. Some ; some projects may choose to "stick" to a particular machine for a build, while others may choose to roam freely between slavesagents. For people accessing Hudson websiteyour Jenkins system via the integrated website (http://yourjenkinsmaster:8080), things works work mostly transparently. You can still browse javadoc, see test results, download build results from a master, without ever noticing that builds were done by slaves.agents. In other words, the master becomes a sort of "portal" to the entire build farm.
Since each agent runs a separate program called an "agent" there is no need to install the full Jenkins (package or compiled binaries) on an agent. There are various ways to start agents, but in the end the agent and Jenkins master need to establish a bi-directional communication link (for example a TCP/IP socket) in order to operate.
Master to agent connections
The most popular ways agents are configured are via connections that are initiated from the master. This allows agents to be minimally configured and the control lives with the master. This does require that the master have network access (ingress) to the agent (typically this is via ssh). In some cases this is not desirable due to security network rules, in which case you can use Agent to master connections via "JNLP".
Agent to master connections
In some cases the agent server will not be visible to the master, so the master can not initiate the agent process. You can use a different type of agent configuration in this case called "JNLP". This means that the master does not need network "ingress" to the agent (but the agent will need to be able to connect back to the master). Handy for if the agents are behind a firewall, or perhaps in some more secure environment to do trusted deploys (as an example). See the sections below to choose the type of agent that is most appropriate for your needs.
Choosing which agent pipelines and steps run on
As you will see below, agents can be labelled. This means different part of your build, or pipeline, can be allocated to run in specific agents (based on their label). This can be useful for tools, operating systems or perhaps for security purposes (it is possible to set quite detailed access rules of what can run where, based on agent configurations). A server that runs an agent is often referred to as a "Node" in Jenkins terminology.
Different ways of starting
Pick the right method depending on your environment and OS that master/slaves runagents run, or if you want the connection initiated from the master or from the agent end.
launch agent via ssh
Hudson Jenkins has a built-in SSH client implementation that it can use to talk to remote sshd and start a slave an agent. This is the most convenient and preferred method for Unix slavesagents, which normally has sshd out-of-the-box. Click Manage Jenkins, then Manage Nodes, then click "New Node." In this set up, you'll supply the connection information (the slave agent host name, user name, and credential)
, and Hudson will ssh credential). Note that the agent will need the master's public ssh key copied to ~/.ssh/authorized_keys. (This is a decent howto if you need ssh help). Jenkins will do the rest of the work by itself, including copying the binary needed for a slave an agent, and starting/stopping slaves.
agents. If your project has external dependencies (like a special ~/.m2/settings.xml, or a special version of java), you'll need to set that up yourself, though. The Slave Setup Plugin may be of help.
\[Where is this documented?\]
This is the most convenient set up on Unix. However, if you are on Windows and you don't have ssh commands with cygwin for example, you can use a tool like PuTTY and PuTTYgen to generate your private and public pair of keys.
For connecting to Windows agents through cygwin sshd, see SSH agents and Cygwin for more details.
Have master launch agent on Windows
For Windows slavesagents, Hudson Jenkins can use the remote management facility built into Windows 2000 or later (WMI+DCOM, to be more specific.) In this set up, you'll supply the username and the password of the user who has the administrative access to the system, and Hudson Jenkins will use that remotely create a Windows service and remotely start/stop them.
This is the most convenient set up on Windows, but does not allow you to run programs that require display interaction (such as GUI tests).
Note : Unlike other Node's configuration type, the Node's name is very important as it is taken as the node's address where to create the service !
Write your own script to launch
If the above turn-key solutions do not provide flexibility necessary, you can write your own script to start a slavean agent. You place this script on the master, and tell Hudson Jenkins to run this script whenever it needs to connect to a slavean agent.
Typically, your script uses a remote program execution mechanism like SSH, RSH, or other similar means (on Windows, this could be done by the same protocols through cygwin or tools like psexec), but Hudson Jenkins doesn't really assume any specific method of connectivity.
What Hudson Jenkins expects from your script is that, in the end, it has to execute the slave agent the agent program like
java -jar slaveagent.jar, on the right computer, and have its stdin/stdout connect to your script's stdin/stdout. For example, a script that does "
java -jar ~/bin/slaveagent.jar" would satisfy this.
(The point is that you let Hudson Jenkins run this command, as Hudson Jenkins uses this stdin/stdout as the communication channel to the slave agentthe agent. Because of this, running this manually from your shell will do you no good).
A copy of
slaveagent.jar can be downloaded from
http://yourserver:port/hudson/jnlpJars/slaveagent.jar . Many people write scripts in such a way that this 160K jar is downloaded during the running of said script, to make sure the ensure that a consistent version of
slaveagent.jar is always used. Such an approach eliminates the agent.jar updating issue discussed below. Note that the SSH Slaves plugin does this automatically, so agents configured using this plugin always use the correct
Technically speaking, in this set up you should update
Launching slaves agents this way often requires an additional initial set up on slaves agents (especially on Windows, where remote login mechanism is not available out of box), but the benefits of this approach is that when the connection goes bad, you can use HudsonJenkins's web interface to re-establish the connection.
Launch agent via "JNLP" from agent back to master in a browser
Another way of doing this is to start a slave an agent through Java Web Start (JNLP).
It requires the server to be configured to appear in first place. So, before attempting to create the build agent, head into manage Jenkins->Global Security->TCP port for JNLP agents.
In this approach, you'll interactively logon to the slave agent node, open a browser, and open the slave agent page. You'll be then presented with the JNLP launch icon. Upon clicking it, Java Web Start will kick in, and it launchs a slave launches an agent on the computer where the browser was running.
This mode is convenient when the master cannot initiate a connection to slavesagents, such as when it runs outside a firewall while the rest of the slaves agents are in the firewall. OTOH, if the machine with a slave an agent goes down, the master has no way of re-launching it on its own.
On Windows, you can do this manually once, then from the launched JNLP slave agentJNLP agent, you can install it as a Windows service so that you don't need to interactively start the slave agent from then on.
If you need display interaction (e.g. for GUI tests) on Windows and you have a dedicated (virtual) test machine, this is a suitable option. Create a jenkins user account, enable auto-login, and put a shortcut to the JNLP file in the Startup items (after having trusted the agent's certificate). This allows one to run tests as a restricted user as well.
Note: If the master is running behind a reverse proxy or similar, you might need to configure "Tunnel connection through" in the "Advanced" section of the JNLP start method on the agent configuration page to make JNLP work.
Launch agent headlessly from agent back to master on command line
This launch mode uses a mechanism very similar to Java Web StartJNLP as described above, except that it runs without using GUI, making it convenient for an execution as a daemon on Unix. To do this, configure this slave agent to be a JNLP slaveagent, take
slaveagent.jar as discussed above, and then from the slaveagent, run a command like this:
$ java -jar slaveagent.jar -jnlpUrl http://hudson.acme.orgyourserver:port/computer/slaveagent-name/slave-agent.jnlp
Make sure to replace "agent-name" with the name of your agent.
Also note that the slaves agents are a kind of a cluster, and operating a cluster (especially a large one or heterogeneous one) is always a non-trivial task. For example, you need to make sure that all slaves agents have JDKs, Ant, CVS, and/or any other tools you need for builds. You need to make sure that slaves agents are up and running, etc. Hudson Jenkins is not a clustering middleware, and therefore it doesn't make this any easier. Nevertheless, one can use a server provisioning tool and a configuration management software to facilitate both aspects.
Node labels for agents
Labels are tags one can give an agent which allows it to differentiate itself from other nodes in Jenkins.
A few reasons why node labels are important:
- Nodes might have certain tools associated with it. Labels could include different tools a given node supports.
- Nodes may be in a multi-operating system build environment (e.g. Windows, Mac, and Linux agents within one Jenkins build system). There can be a label for the operating system of the node.
- Nodes may be in geographically different locations which can be the case for multi-datacenter deployments. Jenkins can have agents in different datacenters when inter-datacenter communication is strictly regulated with edge firewalls. In this case, you might have a label for the datacenter or cloudstack in which the agent resides.
Labels are defined in the settings of static agents and for agent clouds. They must be space separated words which define that agent. Sticking to standard ASCII characters is recommended. Here's a few label suggestions one can use for agent agents:
- For toolchains:
- For operating systems:
osx; or you can be more detailed like
- For geographic locations:
- For platforms:
Jobs and pipelines can be pinned to specific agents or groups of agents if multiple agents have similar sets of labels. In jobs, visit advanced settings and choose restrict where the job can run. In pipelines, you would restrict it with the
node block. You can restrict jobs by specifying a single label or use a label expression. Here's two examples:
- Single label:
- Label expression:
openstack && us-east && linux
The above label expression means that a given agent must have all of those labels.
Example: Configuration on Unix
This section describes my current Kohsuke Kawaguchi's set up of Hudson slaves that I Jenkins agents that he used to use inside Sun for my his day job. My His master Hudson Jenkins node is running ran on a SPARC Solaris box, and I have he had many SPARC Solaris slavesagents, Opteron Linux slavesagents, and a few Windows slavesagents.
- Each computer has an user called
jenkinsand a group called
jenkins. All computers use the same UID and GID. (If you have access to NIS, this can be done more easily.) This is not a Hudson Jenkins requirement, but it makes the slave agent management easier.
- On each computer,
/var/hudsonjenkinsdirectory is set as the home directory of user
jenkins. Again, this is not a hard requirement, but having the same directory layout makes things easier to maintain.
- All machines run SSHD
sshd. Windows slaves agents run
- All machines have ntp client
/usr/sbin/ntpdateinstalled, and synchronize clock regularly with the same NTP server.
jenkinshave all the build tools beneath it --- a few versions of Ant, Maven, and JDKs. JDKs are native programs, so I have JDK copies for all the architectures I need. The directory structure looks like this:
/var/hudsonjenkins +- .ssh +- bin | +- slaveagent (more about this below) +- workspace (hudsonjenkins creates this file and store all data files inside) +- tools +- ant-1.5 +- ant-1.6 +- maven-1.0.2 +- maven-2.0 +- java-1.4 -> native/java-1.4 (symlink) +- java-1.5 -> native/java-1.5 (symlink) +- java-1.8 -> native/java-1.8 (symlink) +- native -> solaris-sparcv9 (symlink; different on each computer) +- solaris-sparcv9 | +- java-1.4 | +- java-1.5 | +- java-1.8 +- linux-amd64 +- java-1.4 +- java-1.5 +- java-1.8
/var/hudsonjenkins/.sshhas private/public key and
authorized_keysso that a master can execute programs on slaves agents through
ssh, by using public key authentication.
- On master, I have a little shell script that uses rsync to synchronize master's
/var/hudsonjenkinsto slaves agents (except
/var/hudsonjenkins/workspace). I also use this the script to replicate tools on all slavesagents.
agentis a shell script that
Jenkins uses to execute jobs remotely. This shell script sets upslave
PATHand a few other things before launching
agent.jar.Below is a very simple example script.
#!/bin/bash JAVA_HOME=/opt/SUN/jdk1.68.0_04152 PATH=$PATH:$JAVA_HOME/bin export PATH java -jar /var/hudsonjenkins/bin/slaveagent.jar
- Finally all computers have other standard build tools like
cvsinstalled and available in PATH.
Note that in the more recent Jenkins packages, the default JENKINS_HOME (aka home directory for the 'jenkins' user on Linux machines, e.g. Red Hat, CentOS, Ubuntu) is set to /var/lib/jenkins.
Some slaves agents are faster, while others are slow. Some slaves agents are closer (network wise) to a master, others are far away. So doing a good build distribution is a challenge. Currently, Hudson Jenkins employs the following strategy:
- If a project is configured to stick to one computer, that's always honored.
- Hudson Jenkins tries to build a project on the same computer that it was previously built.
- Hudson tries to move long builds to slaves, because the amount of network interaction between a master and a slave tends to be logarithmic to the duration of a build (IOW, even if project A takes twice as long to build as project B, it won't require double network transfer.) So this strategy reduces the network overhead.
If you have interesting ideas (or better yet, implementations), please let me know.
Jenkins has a notion of a “node monitor” which can check the status of an agent for various conditions, displaying the results and optionally marking the agent offline accordingly. Jenkins bundles several, checking disk space in the workspace; disk space in the temporary partition; swap space; clock skew (compared to the master); and response time.
Plugins can add other monitors.
Offline status and retention strategy
Administrators can manually mark agents offline (with an optional published reason) or reconnect them.
Groovy scripts such as Monitor and Restart Offline Slaves can perform batch operations like this. There is also a CLI command to reconnect.
Then there is a background task which automatically reconnects agents that are thought to be back up. The behavior is configurable per agent (or per cloud, if using cloudy provisioning for agents) via a “retention strategy”, of which Jenkins bundles several (plugins can contribute others): always keep online if possible; drop offline when not in use; use a schedule; behave according to cloud’s notion of load.
Transition from master-only to master/
Typically, you start with a master-only installation and then much later you add slaves agents as your projects grow. When you enable the master/slave agent mode, Hudson Jenkins automatically configures all your existing projects to stick to the master node. This is a precaution to avoid disturbing existing projects, since most likely you won't be able to configure slaves agents correctly without trial and error. After you configure slaves agents successfully, you need to individually configure projects to let them roam freely. This is tedious, but it allows you to work on one project at a time.
Projects that are newly created on master/slaveagent-enabled Hudson Jenkins will be by default configured to roam freely.
Access an Internal CI Build Farm (Master + Agents) from the Public Internet
One might consider setting up make the Hudson Jenkins master accessible on the public network (so that people can see it), while leaving the build slaves agents within the firewall (because having a lot of machines on the internet is expensive.) This can generally be made to work in two means:typical reasons: cost and security) There are several ways to make it work:
- Equip the master node with a network interface that's exposed to the public Internet (simple to do, but not recommended in general)
- Allow port-forwarding from the master to your slaves agents within the firewall. The port-forwarding should be restricted so that only the master with its known IP can connect to slaves.Use JNLP slaves and have slaves connect to the master.agents. With this set up in the firewall, as far as Jenkins is concerned it's as if the firewall doesn't exist. If multiple hops are involved, you may wish to investigate how to do ssh "jump host" transparently using the ProxyCommand construct. In fact, with a properly configured "jump host" setup, even the master doesn't need to expose itself to the public Internet at all - as long as the organization's firewall allows port 22 traffic.
- Use JNLP agents and have agents connect to the master, not the other way around. In this case it's the agents that initiates the connection, so it works correctly with the NAT firewall.
Note that in both cases, once the master is compromised, all your slaves agents can be easily compromised (IOW, malicious master can execute arbitrary program on slavesagents), so both set-up leaves much to be desired in terms of isolating security breach. Build publisher plugin (which looks almost ready as of this writing) Publisher Plugin provides another way of doing this, in more secure fashion.
Running Multiple Agents on the Same Machine
Using a well established virtualization infrastructure such as Kernel-based Virtual Machine (KVM), it is quite easy to run multiple agent instances on a single physical node. Such instances can be running various Linux, *BSD UNIX, Solaris, Windows. For Windows, one can have them installed as separate Windows services so they can start up on system startup. While the correct use of executors largely obviates the need for multiple agent instances on the same machine, there are some unique use cases to consider:
- You want more configurability between the configured nodes. Say you have one node set to be used as much as possible, and the other node to be used only when needed.
- You may have multiple Jenkins master installations building different things, and so this configuration would allow you to have agents for more than one master on the same box. That's right, with Jenkins you really can serve two masters.
- You may wish to leverage the easiness of starting/stopping/replacing virtual machines, perhaps in conjunction with Jenkins plugins such as the Libvirt Slaves Plugin.
- You wish to maximize your hardware investment and utilization, at the same time minimizing operating cost (e.g. utility expenses for running idling agents).
Follow these steps to get multiple agents working on the same Windows box:
- Add the first agent node in Jenkins and give it its own working dir (e.g. jenkins-agent-a).
- Go to the agent page from the agent box and launch by JNLP, then use the menu to install it as a service instead.
- Once the service is running, you'll get jenkins-slave.exe and jenkins-slave.xml in your agent's work dir.
- Bring up windows services and stop the Jenkins Slave service.
- Open a shell prompt, cd into the agent work dir.
- First run "jenkins-slave.exe uninstall" to uninstall the one that the jnlp-launched app installed. This should remove it from the service list.
- Now edit jenkins-slave.xml. Modify the id and name values so that your multiple agents are distinct. I called mine jenkins-agent-a and Jenkins Agent A.
- Run jenkins-slave.exe install and then check the Windows service list to ensure it is there. Start it up, and watch Jenkins to see if the agent instance becomes active.
- Now repeat this process for a second agent, beginning with configuring the new node in the master config.
When you go to create the second node, it is nice to be able to copy an existing node, and copy the first node you setup. Then you just tweak the Remote FS Root and a couple other settings to make it distinct. When you are done you should have two (or more) Jenkins slave services in the list of Windows services.
Some interesting pages on issues (and resolutions) occurring when using Windows agents:
- Windows agents fail to start via DCOM
- Windows slaves fail to start via ssh
- Windows slaves fail to start via JNLP
Some more general troubleshooting tips:
- Every time Jenkins launches a program locally/remotely, it prints out the command line to the log file. So when a remote execution fails, login to the computer that runs the master by using the same user account, and try to run the command from your shell. You tend to solve problems quickly in this way.
- Each slave agent has a log page showing the communication between the master and the slave agent agent. This log often shows error reports.
- If you use binary-unsafe remoting mechanism like telnet to launch a slavean agent, add the
slaveagent.jarso that Hudson Jenkins avoids sending binary data over the network.
- When the same command runs outside Hudson Jenkins just fine, make sure you are testing it with the same user account as Hudson Jenkins runs under. In particular, if you run Hudson Jenkins master on Windows, consult How to get command prompt as the SYSTEM user.
- Feel free to send your trouble to
Windows agent service upgrades
If a newer version of the Jenkins windows service wrapper (jenkins-slave.exe) is available it will be replaced and used on the next start of the service. On very rare occasions the service wrapper may change its behaviour that would require a change in configuration of the service. This can not be done automatically as the service configuration may not be the default and as such could break an installation.
A quick fix of this is to uninstall the jenkins service then verify the service xml is up-to-date (and contains any site configuration such as the user credentials) and then re-install the service.
Other manual task that may fix the issue:
- Jenkins > 1.565.1 - a message similar to
Restart failure. 'C:\jenkins\jenkins-slave.exe restart' completed with 0 but I'm still alivein the agent error logs. In the windows service manager edit the service configuration to restart the service on failure and add
-noReconnectto the agent arguments in the service xml configuration.
- Jenkins Build Farm Experience Volume I, Volume 2, Volume 3 and Volume 4