Interactive GPU nodes
Aim: Describe how to access and use the interactive GPU nodes on the Stoomboot cluster.
Target audience: Users of the Stoomboot cluster and its GPUs.
Introduction
There are different generations and brands of GPUs in our data center. Depending on the software, you may see different performance figures on these nodes, and you may need to tweak and/or recompile your sources for different types.
To make things a bit easier, we have interactive GPU nodes for each GPU type available. These nodes can be used for compilation and tests, as well as computing. For scaling up the computing needs you should use the batch system GPU nodes instead.
Please keep GPU consumption and testing time to a minimum, and run your real jobs on the batch system.
These machines are intended for the following purposes:
- Running interactive jobs, like analysis work (making plots etc).
- Testing GPU jobs with short runtime
- Interaction with the batch system (see below for the relevant commands).
Usage
Access and Use
Four interactive GPU nodes are available via ssh for interactive and testing GPU use:
Node name | Node manufacturer | Node type name | GPU manufacturer | GPU type | GPU number |
---|---|---|---|---|---|
stbc-g1 | Fujitsu | CELCIUS C740 | NVIDIA | GeForce GTX 1080 | 1 |
stbc-g2 | Fujitsu | CELCIUS C740 | NVIDIA | Quadro GV100 | 1 |
wn-lot-001 | Lenovo | ThinkSystem SR655 | AMD | Radeon Instinct MI50 | 2 |
wn-lot-008 | Lenovo | ThinkSystem SR655 | NVIDIA | Tesla V100 | 2 |
If trying to access the nodes from home via ssh, use eduVPN and or login through login.nikhef.nl
.
Libraries (CUDA)
The drivers for the GPUs and following versions of the the NVIDIA CUDA libraries are installed:
- 9.2
- 10.2
- 11.4
The relevant version of the CUDA Deep Neural Network (cuDNN) library is also installed.
Python + GPU
To get access to Python software in an environment that supports using the GPUs, it is recommended to use conda to create a virtual environment, activate it and install the software you need.
Conda virtual environment
Create and activate a new virtual environment using:
> conda create --prefix /data/your_project/your_username/gpu_venv python=3.9
> conda activate /data/your_project/your_username/gpu_venv
Installing Python packages inside the virtualenv
To install additional software inside the virtualenv, after activating it, use conda to install it; e.g.:
> conda install tensorflow=2.11.0
> conda install pytorch=1.13.1
> conda search tensorflow
tensorflow 2.11.0 cpu_py310hd1aba9c_0 conda-forge
tensorflow 2.11.0 cpu_py38h66f0ec1_0 conda-forge
tensorflow 2.11.0 cpu_py39h4655687_0 conda-forge
tensorflow 2.11.0 cuda112py310he87a039_0 conda-forge
tensorflow 2.11.0 cuda112py38hded6998_0 conda-forge
tensorflow 2.11.0 cuda112py39h01bd6f0_0 conda-forge
> conda install tensorflow=2.11.0=cuda112py39h01bd6f0_0
Using the software
Once things are installed, they can be used directly:
> python
Python 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03)
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2023-02-23 14:47:18.909239: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
Contact
Email pdp@nikhef.nl for questions about the GPUs.