Grid jobs

Aim: Provide the basics of working with Grid jobs in Distributed Computing.

Target audience: Users of the Distributed Computing.

Introduction

These days, most experiments provide users with instructions on how to submit jobs to the grid. If you have access to your experiment's facilities, please use them. If you are curious how things work or would like to submit jobs directly, read on.

To submit jobs to the grid cluster, you submit them to a Compute Element or Compute Entrypoint (also called a CE). At Nikhef, for example, you can follow the instructions below. Nikhef has three CEs for accepting jobs from the grid:

dissel.nikhef.nl
brug.nikhef.nl
klomp.nikhef.nl

Usage

Getting started with X509 certificates

You will need to request a grid certificate before submitting any jobs to the Grid.

Follow the detailed step-by-step Nikhef instructions to request a certificate: http://ca.dutchgrid.nl/tcs/
The next step is to contact your experiment, or look on their computing pages, to find out if you need to register with their Virtual Organization Management Server (VOMS).
Experiments with VOMS servers are listed below with a link to their registration pages:
ALICE https://voms24.cern.ch:8443/voms/alice/register/start.action
ATLAS https://voms24.cern.ch:8443/voms/atlas/register/start.action
KM3NeT https://voms02.scope.unina.it:8443/voms/km3net.org/register/start.action
LHCb https://voms24.cern.ch:8443/voms/lhcb/register/start.action
Virgo https://voms.cnaf.infn.it:8443/voms/virgo/register/start.action
Xenon https://voms.grid.sara.nl:8443/voms/xenon.biggrid.nl/register/start.action

Once you have your certificate you should be able to follow the below instructions for generating X509 certificate proxies.

Getting an X509 proxy

Once you have sourced the grid middleware tools, type

 voms-proxy-init --voms <YOUR-VO>

to create a VOMS proxy. The voms-proxy-* commands are available once the setup script has been sourced.

Sample output:

 $ voms-proxy-init -voms pvier
 Enter GRID pass phrase:
 Your identity: /O=dutchgrid/O=users/O=nikhef/CN=Some User
 Creating temporary proxy .......................................... Done
 Contacting  voms.grid.sara.nl:30000
 [/O=dutchgrid/O=hosts/OU=sara.nl/CN=voms.grid.sara.nl] "pvier" Done
 Creating proxy
 ................................................................... Done
 Your proxy is valid until Fri Dec  7 00:08:49 2007

or you could use the arcproxy tool to generate a proxy:

 $ arcproxy --voms <YOUR-VO>
 Enter Password for PKCS12 certificate:
 Your identity: /DC=org/DC=terena/DC=tcs/C=NL/O=Nikhef/CN=<Some User>
 Contacting VOMS server (named pvier): voms.grid.sara.nl on port: 30000
 Proxy generation succeeded
 Your proxy is valid until: 2020-11-18 04:32:41

Congratulations! You are now ready to use grid middleware tools.

Submitting your job

To submit a job to the Nikhef Grid you will need a job description file. This file describes the kind of resources you need to request from the site to run your jobs (see an example job description file).

Please be sure to specify a queue in your job description with

 ("queue" = "short|medium|long|..." )

General queues and walltimes available:

Queue Name	Max. Walltime (hhss)	Allowed VOs
short	04:00:00	alice atlas dans projects.nl pvier virgo dune lsgrid lofar tutor enmr.eu bbmri.nl xenon.biggrid.nl chem.biggrid.nl drihm.eu
medium	36:00:00	alice atlas dans projects.nl pvier virgo dune lsgrid lofar tutor enmr.eu bbmri.nl xenon.biggrid.nl chem.biggrid.nl drihm.eu
long	96:00:00	alice atlas dans projects.nl pvier virgo dune lsgrid lofar tutor bbmri.nl xenon.biggrid.nl chem.biggrid.nl drihm.eu

For more information or to find other queues, use lcg-info or lcg-infosites which will give you more information about what is available to your VO. For example,

lcg-infosites --vo pvier -f NIKHEF-ELPROD all

More information is also available on the SURF wiki: http://doc.grid.surfsara.nl/en/latest/Pages/Service/system_specifications/gina_specs.html#queues

Specifying Job Requirements

The default values for memory, nodes, CPUs and local scratch space may not be adequate for your use case. It is possible to specify the requirements for your jobs in the XRSL file which will then be translated into requirements on the grid batch system. This will either match suitable resources, or match nothing at all if your requirements exceed what is available. If this is the first time you need to specify additional requirements, please ask the site administrators for advice.

Memory requirements

The amount of main memory (RAM) required for the job can be passed by adding this line to the XRSL file:

 (memory=<8192>)

This example requests 8GB of RAM (the unit is Megabytes). Be aware that exceeding the requested amount in the actual job may result in termination of the job by the batch system. Multi-core jobs

XRSL parameters:

 (count=4)
 (countpernode=1)

These examples request 4 cores on 1 node.

Example job description file

An example job description file to submit to an ARC-CE looks something like:

&( executable = "test.sh" )
( stdout = "stdout" )( stderr = "stderr" )
( gmlog = "gmlog" )
(count=1)
(runtimeenvironment=ENV/GLITE)
(inputFiles=("things.txt" ""))

Information about how to create your job description files for an ARC-CE can be found at http://www.nordugrid.org/arc/arc6/users/xrsl.html.

After creating a proxy and have your job description file ready, some commands to start running your job can look something like:

 # Submit your job to an ARC endpoint with your xrsl or adl file specified
 arcsub -c brug.nikhef.nl [YOUR XRSL OR ADL FILE]

 # Check the status of all your jobs. Adding -l will give you a long description of each of your jobs.
 arcstat -a(l)

 # Or you can add the unique ID for your jobs with:
 arcstat [gsiftp|https]://brug.nikhef.nl:443/[jobs|arex]/[UNIQUE JOB ID]

 # Fetch your job output, logs etc with...
 arcget -a
 # (or arcget with a single job id)

Links

Contact

Email grid.sysadmin@nikhef.nl for questions about anything Grid related.