Skip to content

Interactive GPU nodes

Aim: Describe how to access and use the interactive GPU nodes on the Stoomboot cluster.

Target audience: Users of the Stoomboot cluster and its GPUs.

Introduction

There are different generations and brands of GPUs in our data center. Depending on the software, you may see different performance figures on these nodes, and you may need to tweak and/or recompile your sources for different types.

To make things a bit easier, we have interactive GPU nodes for each GPU type available. These nodes can be used for compilation and tests, as well as computing. For scaling up the computing needs you should use the batch system GPU nodes instead.

Please keep GPU consumption and testing time to a minimum, and run your real jobs on the batch system.

These machines are intended for the following purposes:

  • Running interactive jobs, like analysis work (making plots etc).
  • Testing GPU jobs with short runtime
  • Interaction with the batch system (see below for the relevant commands).

Usage

Access and Use

Four interactive GPU nodes are available via ssh for interactive and testing GPU use:

Node name Node manufacturer Node type name GPU manufacturer GPU type GPU number
stbc-g1 Fujitsu CELCIUS C740 NVIDIA GeForce GTX 1080 1
stbc-g2 Fujitsu CELCIUS C740 NVIDIA Quadro GV100 1
wn-lot-001 Lenovo ThinkSystem SR655 AMD Radeon Instinct MI50 2
wn-lot-008 Lenovo ThinkSystem SR655 NVIDIA Tesla V100 2

If trying to access the nodes from home via ssh, use eduVPN and or login through login.nikhef.nl.

Libraries (CUDA)

The drivers for the GPUs and following versions of the the NVIDIA CUDA libraries are installed:

  • 9.2
  • 10.2
  • 11.4

The relevant version of the CUDA Deep Neural Network (cuDNN) library is also installed.

Python + GPU

To get access to Python software in an environment that supports using the GPUs, it is recommended to use conda to create a virtual environment, activate it and install the software you need.

Conda virtual environment

Create and activate a new virtual environment using:

> conda create --prefix /data/your_project/your_username/gpu_venv python=3.9
> conda activate /data/your_project/your_username/gpu_venv

Installing Python packages inside the virtualenv

To install additional software inside the virtualenv, after activating it, use conda to install it; e.g.:

> conda install tensorflow=2.11.0
> conda install pytorch=1.13.1
Sometimes different builds are available, e.g. for different python versions, CUDA versions or for CPU. These can be selected by specifying the exact build:

> conda search tensorflow
tensorflow                    2.11.0 cpu_py310hd1aba9c_0  conda-forge
tensorflow                    2.11.0 cpu_py38h66f0ec1_0  conda-forge
tensorflow                    2.11.0 cpu_py39h4655687_0  conda-forge
tensorflow                    2.11.0 cuda112py310he87a039_0  conda-forge
tensorflow                    2.11.0 cuda112py38hded6998_0  conda-forge
tensorflow                    2.11.0 cuda112py39h01bd6f0_0  conda-forge
> conda install tensorflow=2.11.0=cuda112py39h01bd6f0_0

Using the software

Once things are installed, they can be used directly:

> python
Python 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:39:03)
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2023-02-23 14:47:18.909239: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]

Contact