Skip to content

Stoomboot NG HTCondor Cluster

This batch system is under development and testing. There is still a chance of some service disruption (for example, potential job restarts), but that is expected to be minimal at this time.

Stoomboot is in the process of being migrated to a new batch system software—from the current torque/maui software to the HTCondor software suite. As the new version of Stoomboot scales out, we will need to make adjustments to the configuration.

Please report any issues you experience while testing the system to stbc-admins@nikhef.nl or stbc-users@nikhef.nl.

The current HTCondor batch system is accessible from stbc-i2.nikhef.nl and stbc-i3.nikhef.nl. During this migration, we will drain computing capacity from the old torque system and add it to the new HTCondor batch system. To check the current volume of Stoomboot-ng, use condor_status. This will show you all the slots available to run jobs in the cluster.

Some notable changes about Stoomboot-ng:

  1. All jobs will be running in a container in the new HTCondor cluster. This will allow more flexibility to run operating systems and versions needed for data analysis.
  2. User commands will change from ‘qsub’ to ‘condor_submit’ with a variety of options for querying the system and status. See the New commands table below.
  3. Jobs will still be limited to a maximum runtime of 4 days, depending on the “JobCategory” the job is chosen to run in.
  4. New fields will need to be added to your job submission files to run in the right “queue”.

New commands

Command Torque HTCondor equivalents
Submit job to batch system qsub condor_submit
See the status of your jobs qstat condor_q
Remove jobs qdel condor_rm
Show nodes or job slots available pbsnodes condor_status
Start an interactive session qsub -i condor_submit -i file.sub

Job description files

If you have never worked with HTCondor, please refer to the User’s Manual with further information about job description files to submit to the cluster: https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html

Job submission requirements

Some customisations needed to run your jobs on the local Nikhef HTCondor cluster include a few additional requirements for your job submission script. You will need to include

+SingularityImage = “el9” # This attribute must be in a string.
## Or
+UseOS = “el9” 
## Or bring your own container
+SingularityImage = "/project/myproject/ourimages/myfavouriteimage.sif"

in the submission file.

Next is to include a “JobCategory” for the kind of job you wish to run. This is the duration of time your job plans to run. You can submit to short, medium and long

* short   4 hours
* medium  24 hours
* long    96 hours

Job categories can be specified with, for example,

+JobCategory = “short”

**Note, your jobs will need to specify JobCategory and SingularityImage or UseOS for a job to be submitted to the batch system. **

Example job submission

executable              = test.sh
log                     = test.log
output                  = outfile.txt
error                   = errors.txt
## Can use "el7", "el8", or "el9" for UseOS or you can specify your own 
## SingularityImage but an OS must be specified and in string quotations. 
+UseOS                  = "el9"     
## This job can run up to 4 hours. Can choose "express", "short", "medium", or "long".
+JobCategory            = "short"   
queue

Interactive jobs

It is possible to run an interactive job in the new HTCondor system. You will need a simple submit file like the above example, and then run condor_submit -i mysubfile.sub and your session will begin in the container image specified in the UseOS or SingularityImage job attribute.