GPU batch jobs
Aim: Provide the basics of how to access and use the Stoomboot cluster GPU batch system, i.e. how to submit batch jobs to one of the GPU queues.
Target audience: Users of the Stoomboot cluster's GPUs.
Introduction
The Stoomboot cluster has a number of GPU nodes that are suitable for running certain types of algorithms. Both interactive GPU nodes and batch nodes are available.
Currently, the GPUs can be used by only one user at a time. This means that the interactive nodes need more discipline from the users to share this interactive resource than the generic interactive nodes. (Contact stbc-admin@nikhef.nl for coordination.)
The GPU batch nodes can be used through the GPU queues, but only use this queue for actual GPU jobs!
Types of GPU nodes
There are two main GPU manufacturers, and software will typically only work on one brand or the other, so we have queues that direct jobs only to the nodes with one type: - gpu-amd
is the queue to use for jobs that run on AMD GPUs - gpu-nv
is the queue to use for jobs that run on NVIDIA GPUs
Prerequisites
- A Nikhef account;
- An ssh client.
Usage
Submitting GPU batch system jobs
In order to direct your jobs to a particular type of node, you can set the job requirements to match the node property. The following example would direct the job to a full node with the dual AMD MI50 cards.
qsub -l 'nodes=1:mi50,walltime=12:00:00,mem=4gb' -q gpu-amd job.sh
qsub -l 'nodes=1:v100,walltime=12:00:00,mem=4gb' -q gpu-nv job.sh
Do not specify ppn
for CPUs with the GPU jobs. The job will get allocated all CPUs on the server which will be either 32 or 64 CPU cores.
Node name | Number of nodes | Node manufacturer | Node type name | GPU manufacturer | GPU type | GPU number | qsub tags |
---|---|---|---|---|---|---|---|
wn-cuyp-{002..003} | 2 | Fujitsu | CELCIUS C740 | NVIDIA | GeForce GTX 1080 | 1 | nvidia, gtx1080 |
wn-lot-{002..007} | 6 | Lenovo | ThinkSystem SR655 | AMD | Radeon Instinct MI50 | 2 | amd, mi50 |
wn-lot-009 | 1 | Lenovo | ThinkSystem SR655 | NVIDIA | Tesla V100 | 2 | nvidia, v100 |
Storing output for GPU batch jobs
Storing output data from a GPU job should follow the same conventions as the CPU batch jobs. See "Storage output data" on the Batch jobs page.
Links
- Using ssh at Nikhef
- Interactive CPU nodes
- Interactive GPU nodes
- GPU batch jobs
- Overview of Stoomboot
Contact
- Email pdp@nikhef.nl for questions about GPUs.
- Chat in Nikhef's Mattermost channel for stbc-users.