Skip to content

Conda environments

Aim: Provide information about how to install software without root privileges and/or need to install a custom kernel for a Jupyter Notebook.

Target audience: Users who need to specific software environments.

Introduction

Conda is an open source package and environment manager. Mamba is a (much) faster drop-in replacement for it. Conda/mamba help find and install packages without needing root privileges (i.e., with a simple mamba install...). Conda environments can be extremely useful when building software. The distribution of packages happens through so-called "channels".

The default channel of Conda is administered by the developers of Anaconda: Anaconda Inc. There are some licensing issues with packages distributed through the default conda channels and these should therefore not be used. Instead there is a community-managed channel called conda-forge that should be used instead. Mamba uses conda-forge by default

Conda/mamba is also useful when adding customized kernels to JupyterLab sessions (the next-generation of a Jupyter Notebook) and Jupyter Notebooks, which can be used from the Nikhef JupyterHub service.

Usage

Note on software toolchains

Software toolchains (of which many LHC experiments and non-LHC experiments have their own) consist of a number of pieces of software required to build other software. This generally comprises the C/C++ libraries; a C/C++ compiler; tools to handle, load and execute binary files and a set of kernel headers.

For example the default toolchain available on machines running Alma9 consists of:

gcc 11.4.1
glibc 2.34
binutils 2.35.2
kernel-headers 5.14.0
For most purposes this is sufficiently recent, but sometimes a different toolchain is needed.

To keep up with demands on the toolchain of various dependencies of python packages, Conda ships its own (recent) toolchain.

Using Conda

To install conda, the Miniforge version is recommended, which sets conda-forge as the default channel. Download an installer from this page, and install it on your laptop; for example:

> wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh

> bash Miniforge3-Linux-x86_64.sh -b -p /somewhere/on/your/laptop

The final step of the conda installation will modify your shell startup scripts to automatically make the conda command available. _You will have to log out and back in to see these changes.

Creating a virtual environment with Conda

Because the software environment can be regenerated, we recommend creating a virtual environment (venv) in your group directory on /data instead of /project or your _home directory.

To avoid packages being downloaded into your home directory, it is recommended to add the following lines to the ~/.condarc file (replace your_project and your_username with your actual group and username):

pkgs_dirs:
  - /data/your_project/your_username/conda_pkgs

The python version you need can also be specified when creating the venv:

> conda create --prefix /data/your_project/your_username/my_venv python=3.11
It may happen that some of the packages that you would like to install into the environment are only available for certain python versions, so pay attention when you create the venv.

Once the virtual environment has been created, you can activate it using:

> conda activate /data/your_project/your_username/my_venv

Installing Python packages inside the conda environment

Once the virtual environment has been created and activated, it is ready to be used. The first step is usually to install additional software into the environment. There are two ways of doing this: with conda and with pip. The rule of thumb is: try with conda first and if a package is not available, try with pip.

The main reason for this ordering is that for some packages pip uses the C/C++/Fortran toolchain to build libraries that are then loaded into python. It, however, doesn’t know about the (newer) toolchain installed by conda and will use the one on the (CentOS 7) host instead. For quite a few packages this toolchain is too old and the build will fail. When packages are installed with conda, they have been built using the conda toolchain and will therefore work anyway.

Installing packages via conda

To install additional software into the virtual environment using conda itself: activate it and install:

> conda activate /data/your_project/your_username/my_venv
> conda install root
Packages available in conda can be searched using:
> conda search root

Installing packages via pip

To install additional software into the virtual environment with pip: activate it, install pip itself and then use it to install other software; e.g.:

> conda activate /data/your_project/your_username/my_venv
> conda install pip
> pip install keras
> pip install tensorflow-gpu

It's important to note than when installing software, pip will only take into account dependencies of packages that it is currently installing. Dependencies of packages that are already installed are ignored. This can very easily lead to breakage. When using pip to install additional packages, it is therefore strongly recommended to pass it all packages that you want to be simultaneously installed, i.e. including those that are already installed. That forces pip to take all required dependencies into account.

If there is a dependency conflict, it may be possible to to resolve it by asking pip to install specific - often older - versions of a package that satisfies all requirements. Unfortunately pip will neither attempt to do this automatically, nor will it suggest this and you'll have to try by hand. This can quickly lead to what is known as dependency hell.

Using the software in the new venv

Once things are installed, they can be used directly:

> python
Python 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ROOT import gROOT
>>> gROOT.GetVersion()
'6.30/04'

Contact

  • Email pdp@nikhef.nl for help setting up virtual or conda environments.