Child pages
  • Splitting a big job into smaller jobs
Skip to end of metadata
Go to start of metadata

A build is normally a fairly sequential process, and in a big project, a full execution can easily take hours. While one could create such a job on Jenkins, the long turn-around time to the result tends to reduce the value of continuous integration. This page discusses a technique to cope with this problem.

The idea to is to split a big build into multiple stages. Each stage is executed sequentially for a particular build run, but this works like a CPU pipeline and increase the throughput of CI, and also reduces the turn-around time by reducing the time a build sits in the build queue.


In this situation, your earlier stage needs to pass files to later stages. A general way to do this is as follows:

  1. An earlier stage archives all the files into a zip/tgz file at the end of the build.
  2. Tell Jenkins to archive this zip/tgz file as a post-build action, take a fingerprint of it, then trigger the next stage.
  3. The first thing the next stage does in its build is to obtain this bundle through the permalink for the last successful artifact, then unzip it. Do keep this archive file around because we'll take a fingerprint of it here, too.
  4. The build proceeds by using the files obtained from the earlier stage.
  5. Tell Jenkins to fingerprint the zip/tgz file. This allows you to correlate executions of these stages to track the flow.

If you have more than 2 stages, you can repeat this process. In some cases, this "zip/tgz" file would have to contain the entire workspace. If so, the next stage can use the URL SCM plugin to simplify the retrieval.

JENKINS-682 keeps track of an RFE to more explicitly and better support this use case. Please feel free to add yourself to CC, vote, and comment on the issue.

Note: there is a new plugin which helps with this problem: http://wiki.jenkins-ci.org/display/JENKINS/Clone+Workspace+SCM+Plugin

  • No labels

15 Comments

  1. It would be nice to have an example of this.  I would like to have projects setup to minimize the amount of time pulling from CVS things that don't change often.  I am new to Hudson and am learning every day! 

     We want to do something like this?

    Project 1:  ReallyBigCVSModules:  Using a custom work space, pulls Project/ThirdParty, Project/OtherHugefiles then just simply echo completion.

    Project 2: BuildsSomething1: Using the same work space as #1 above, pull Project/src and compile.  If successful, tag both project 1 and 2.

    Or, is the idea to make all the pull from CVS #1 a tarball and have #2 untar?  If so, how do we tag both?

  2. J J

    I tried this method but did not like it at all for a few reasons.
    my biggest issue with this idea is that it totally ties the build process to hudson. you can no longer independently run buildscripts unless the hudson server is available and running with exactly the right configurations, so this suggestion seems to contradict the recommended practice of merely launching independently runnable buildscripts. It also takes considerable time & resources to download, and unpack previously built projects in this way, but once its set up it works ok.

    I tried this with a set of build jobs running before a final deploy-to-test job, but the more I did, the deeper became the dependency on hudson. I am now trying to find a way to share common configuration files amongst the jobs without hudson becoming the lynchpin. I'd love to know how others have solved these issues.

  3. Assume we have two builds, A and B. A compiles a library which will be used for all kinds of testing. B is one of the test-processes that will check the output of A.

    We let A do it's thing and at the end of the process it produces the required output which is saved as an Artifact.

    How does B get the artifact from the completed process A? Do I need to look it up via some kind of API call or is there a simpler way to ensure that an artifact from a previously completed build is available for another build to work on?

    Thanks

    1. Unknown User (jackson.ha@gmail.com)

      Salim,

       did you find a solution to your question?  i think custom workspaces might work...but I haven't figured it out totally yet.

  4. Unknown User (slavyan71@gmail.com)

    it's surprising and very bad that nobody anwered that last question. is hudson project and community is dead?

    simple question was here and I have similair. Project A references project B (.Net). I can't find a way to build A, unless in the project A I reference not a project B itself but an assembly from B. 

    I want to reference a project, this how complex projects are stored in source control.

  5. Unknown User (mike buchanan)

    "3. The first thing the next stage does in its build is to obtain this bundle through the permalink for the last successful artifact"

    How do I do this?

    1. Unknown User (jackson.ha@gmail.com)

      Mike...

       install URL SCM. after you have the zip file made in stage 1, configure your stage 2's URL SCM to point to it. 

      i got it working last wk.... works pretty well.

      1. IMO this could be nice ... in case I could add a URL to copy from additionally to my regular, e.g. SVN, checkout - not alternatively!

  6. Unknown User (austintam@gmail.com)

    After scouring the web for a solution, this is what I've come up with:

    • I've setup two jobs: 'build' and 'test'.
    • 'Build' obviously builds my project (e.g. an ear - for me, it deploys to server as well).
    • 'Test' runs unit tests and selenium tests AFTER it executes a shell command to copy 'build' job's workspace into the 'test' workspace (see below)
    • The 'test' job is triggered by the 'build' job. e.g. 'build' job's post build actions builds 'test' OR 'test' job's build trigger is set to build after the 'build' job completes (e.g. 'Build after other projects are built')
    • SVN Tagging and Publish Test results are configured on the 'test' job. 
    • fingerprint is turned on (apparently this is required for downstream test aggregation)

    shell command to copy a job's workspace to another workspace:

    cd ..
    rm -rf workspace
    cp -r /opt/tomcat5/.hudson/jobs/<job name>/workspace workspace

  7. I propose to install an additional 'check-boxed' item to include lastStableBuilds / lastSuccesfulBuilds artifacts with a drop down menu providing the available jobs.

    If no artifacts are available the using job might just fail. This could even be a task of a new plugin.

  8. Unknown User (atomlin@sandforce.com)

    I am a new user to Hudson and am also trying to get this working. Here is the problem I am trying to solve. I have a build of multiple targets from the same source tree (C source on Windows). I have already created a mechanism in cygwin bash that allows as many machines as possible to execute and load balance the build its self (this is done via file locking on a shared directory and file system primitives). Anyway, all I want to do is execute the same command on multiple machines. I looked at two options - 1) multiple jobs 2) distfork and both have problems it seems

    1) Multiple jobs

    If I have 5 build machines, and create 1 main job to trigger 5 dependent jobs (one on each machine) I cannot see how to get info from the first build to the subsequent builds. The first build I want to get source, tgz it up and distribute to other machines. If I could pass any kind of parameter between jobs I could get it to work, but I cannot see anything.

    2) diskfork 

    I do not see where to get the CLI.jar file from. This almost looks like exactly what I want, although it is not clear that if I launch multiple commands how do I keep all the console messages straight.

    Any help would be appreciated

    1. Unknown User (atomlin@sandforce.com)

      Ok, I think I got it working. I am using the multiple jobs scheme.

      I have 1 project that gets all the source code, gets some SCM info (accurev) and tgz's the entire workspace and stores the result as an Artifact of the build. This project then triggers multiple sub projects that operate in parallel, 1 per build server. These servers all get the tgz file

      Project 1: Test1.

      accurev stat -R * >FileStats.txt
      accurev info >WsInfo.txt
      tar -cvzf source.tgz --exclude source.tgz *
      

      Project 2-n

      rm -rf *
      "c:\program files\gnuwin32\bin\wget" -nv http://hudson:8080/job/Test1/lastSuccessfulBuild/artifact/source.tgz
      tar -xvf source.tgz
      dobuild.bat
      

      The advantage of this scheme is that all the targets a guaranteed to have the same source atomically (if multiple machines fetch source code it can result in multiple different versions)

      Note that Test1 gets source code and the other projects do not.

      1. Unknown User (atomlin@sandforce.com)

        Here is bash script used on all targets to balance build. All the servers share a cfg file and execute 1 line per build, others may find this file system locking scheme useful.

        Note that COPYLOC is environment variable that exists on network share. 

        get_locked_line()
        {
            LOCKFOLDER="$COPYLOC"
            LOCKFILE="buildlock.txt"
            LOCKSHARED="lock.txt"
            LOCKFILEFULL="$LOCKFOLDER/$LOCKSHARED"
            LOCKCOUNTFILE="$LOCKFOLDER/count.txt"
            LOCKFILECOUNT=0
            LOCKDONE="NO"
        
            echo "Lock file is $LOCKFILEFULL"
            echo "$COMPUTERNAME" >$LOCKFILE
        
            while [ $LOCKFILECOUNT -lt $cnt ] && [ "$LOCKDONE" == "NO" ]
            do
                #try to copy my lock file to share - may or may not be successful
                cp $LOCKFILE $LOCKFILEFULL
        
                #try to make file read only - may or may not be successful
                if chmod 444 $LOCKFILEFULL
                then
                   #only perform check on read only file so that all computers read same thing
                   if [ "$(cat $LOCKFILEFULL)" == "$COMPUTERNAME" ]
                   then
                      # Yay, I got the lock
                      if [ -f $LOCKCOUNTFILE ]
                      then
                         # file already exists so get current line number
                         LOCKFILECOUNT=$(cat $LOCKCOUNTFILE)
                      fi
                      #set return value of function
                      LOCKCOUNTRETURN=$LOCKFILECOUNT
                      LOCKDONE="YES"
                      echo "Lock obtained, cfg file line to process is $LOCKFILECOUNT"
                      let "LOCKFILECOUNT = $LOCKFILECOUNT + 1"
                      echo $LOCKFILECOUNT > $LOCKCOUNTFILE
                      rm -f $LOCKFILEFULL
        
                   else
                      echo "Waiting for lock Zzzz.."
                      sleep 1
                   fi
                fi
                echo "Lock loop"
            done
            echo "Lock done"
        }
        


  9. Unknown User (barnash)

    Hi, I've just posted something that shows a simple case of splitting the job into build and test jobs:

    Here is the link:

    http://barnashcode.blogspot.com/2010/07/split-hudson-jobs.html

    I hope you'll like it.

  10. I have job 1 which installs Application A with version 1.

    I have a job 2 which runs a specific test on the application installed by job 1.

    Both jobs are triggered by an upstream job. The problem i am facing is that i am not able to get them to run one common workspace as each job runs on its own workspace. When this whole use case is configured as one job i am able to get it to work seamlessly.

    Can some one help me with my projects configuration?

    Thanks