Conda environments
Aim: Provide information about how to install software without root privileges and/or need to install a custom kernel for a Jupyter Notebook.
Target audience: Users who need to specific software environments.
Introduction
Conda is an open source package and environment manager. Mamba is a (much) faster drop-in replacement for it. Conda/mamba help find and install packages without needing root privileges (i.e., with a simple mamba install...
). Conda environments can be extremely useful when building software. The distribution of packages happens through so-called "channels".
The default channel of Conda is administered by the developers of Anaconda: Anaconda Inc. There are some licensing issues with packages distributed through the default conda channels and these should therefore not be used. Instead there is a community-managed channel called conda-forge that should be used instead. Mamba uses conda-forge by default
Conda/mamba is also useful when adding customized kernels to JupyterLab sessions (the next-generation of a Jupyter Notebook) and Jupyter Notebooks, which can be used from the Nikhef JupyterHub service.
Usage
Note on software toolchains
Software toolchains (of which many LHC experiments and non-LHC experiments have their own) consist of a number of pieces of software required to build other software. This generally comprises the C/C++ libraries; a C/C++ compiler; tools to handle, load and execute binary files and a set of kernel headers.
For example the default toolchain available on machines running Alma9 consists of:
For most purposes this is sufficiently recent, but sometimes a different toolchain is needed.To keep up with demands on the toolchain of various dependencies of python packages, Conda ships its own (recent) toolchain.
Using Conda
To install conda, the Miniforge version is recommended, which sets conda-forge as the default channel. Download an installer from this page, and install it on your laptop; for example:
> wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
> bash Miniforge3-Linux-x86_64.sh -b -p /somewhere/on/your/laptop
The final step of the conda installation will modify your shell startup scripts to automatically make the conda command available. _You will have to log out and back in to see these changes.
Creating a virtual environment with Conda
Because the software environment can be regenerated, we recommend creating a virtual environment (venv) in your group directory on /data
instead of /project
or your _home directory.
To avoid packages being downloaded into your home directory, it is recommended to add the following lines to the ~/.condarc
file (replace your_project
and your_username
with your actual group and username):
The python version you need can also be specified when creating the venv:
It may happen that some of the packages that you would like to install into the environment are only available for certain python versions, so pay attention when you create the venv.Once the virtual environment has been created, you can activate it using:
Installing Python packages inside the conda environment
Once the virtual environment has been created and activated, it is ready to be used. The first step is usually to install additional software into the environment. There are two ways of doing this: with conda
and with pip
. The rule of thumb is: try with conda
first and if a package is not available, try with pip
.
The main reason for this ordering is that for some packages pip
uses the C/C++/Fortran toolchain to build libraries that are then loaded into python. It, however, doesn’t know about the (newer) toolchain installed by conda and will use the one on the (CentOS 7) host instead. For quite a few packages this toolchain is too old and the build will fail. When packages are installed with conda
, they have been built using the conda toolchain and will therefore work anyway.
Installing packages via conda
To install additional software into the virtual environment using conda itself: activate it and install:
Packages available in conda can be searched using:Installing packages via pip
To install additional software into the virtual environment with pip: activate it, install pip itself and then use it to install other software; e.g.:
> conda activate /data/your_project/your_username/my_venv
> conda install pip
> pip install keras
> pip install tensorflow-gpu
It's important to note than when installing software, pip
will only take into account dependencies of packages that it is currently installing. Dependencies of packages that are already installed are ignored. This can very easily lead to breakage. When using pip
to install additional packages, it is therefore strongly recommended to pass it all packages that you want to be simultaneously installed, i.e. including those that are already installed. That forces pip
to take all required dependencies into account.
If there is a dependency conflict, it may be possible to to resolve it by asking pip
to install specific - often older - versions of a package that satisfies all requirements. Unfortunately pip
will neither attempt to do this automatically, nor will it suggest this and you'll have to try by hand. This can quickly lead to what is known as dependency hell.
Using the software in the new venv
Once things are installed, they can be used directly:
> python
Python 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ROOT import gROOT
>>> gROOT.GetVersion()
'6.30/04'
Links
Contact
- Email pdp@nikhef.nl for help setting up virtual or conda environments.