Skip to content

Storage Overview

Aim: Help you choose the right storage for your data and use it correctly.

Target audience: All users of Nikhef computing infrastructure.


Quick decision guide

(Click on the "Yes" answer for further information about the storage type.)

flowchart TB
    START([Where does my data go?]):::start

    Q1{"Temporary or intermediate data?"}:::question
    Q2{"Personal or private files?"}:::question
    Q3{"Precious or unique? Needs backup?"}:::question
    Q4{"Large data files, over 1 GB?"}:::question

    TMPDIR["$TMPDIR — local to node, ephemeral, fast"]:::tmpdir
    HOME["$HOME — backed up, ~2 GB quota, private"]:::home
    CONDA(["ℹ Conda: set pkgs_dirs to /data"]):::conda
    PROJECT["/project — backed up, group quota, NFS"]:::project
    DCACHE["/dcache — no backup, write-once, petabytes"]:::dcache
    DATA["/data — no backup, POSIX, 1–100 GB"]:::data

    START --> Q1
    Q1 -->|Yes| TMPDIR
    Q1 -->|No| Q2
    Q2 -->|Yes| HOME
    HOME -.-> CONDA
    Q2 -->|No| Q3
    Q3 -->|Yes| PROJECT
    Q3 -->|No| Q4
    Q4 -->|Yes| DCACHE
    Q4 -->|No| DATA

    click TMPDIR href "https://kb.nikhef.nl/ct/Node_scratch_space.html" _blank
    click HOME href "https://kb.nikhef.nl/ct/Directory_home.html" _blank
    click CONDA href "https://kb.nikhef.nl/ct/Conda_environments.html#creating-a-virtual-environment-with-conda" _blank
    click PROJECT href "https://kb.nikhef.nl/ct/Directory_project.html" _blank
    click DCACHE href "https://kb.nikhef.nl/ct/Directory_dcache_stoomboot.html" _blank
    click DATA href "https://kb.nikhef.nl/ct/Directory_data.html" _blank

    classDef start    fill:#d3d1c7,stroke:#5f5e5a,color:#2c2c2a
    classDef question fill:#f5f5f4,stroke:#b4b2a9,color:#2c2c2a
    classDef tmpdir   fill:#faeeda,stroke:#ba7517,color:#412402
    classDef home     fill:#e1f5ee,stroke:#0f6e56,color:#04342c
    classDef project  fill:#eeedfe,stroke:#534ab7,color:#26215c
    classDef dcache   fill:#e6f1fb,stroke:#185fa5,color:#042c53
    classDef data     fill:#faece7,stroke:#993c1d,color:#4a1b0c
    classDef conda    fill:#faece7,stroke:#993c1d,color:#712b13

Conda environments and $HOME

By default, Conda installs packages and environments into $HOME, which has a small quota (~2 GB) and will fill up quickly. Create your virtual environment in /data instead.

Add the following to ~/.condarc to redirect package downloads away from $HOME (replace your_project and your_username with your actual group directory and username):

pkgs_dirs:
  - /data/your_project/your_username/conda_pkgs

Then create your environment directly in /data:

conda create --prefix /data/your_project/your_username/my_venv python=3.11
conda activate /data/your_project/your_username/my_venv

See the full Conda environments documentation for further guidance on installing packages and adding custom kernels to JupyterLab. For help setting up environments, contact stbc-admin@nikhef.nl.

Using storage with Stoomboot jobs

Data products for Stoomboot batch jobs should be written to /dcache (Not $HOME, /project or /data).

Use condor file transfer with your jobs to help write data to storage with

should_transfer_files = YES
when_to_transfer_output = ON_EXIT
in your job description file. See HTCondor documentation.


Quick reference

Location Use for Backed up Modifiable Quota Notes
$TMPDIR Temporary / intermediate job data No Yes Node-local Auto-cleaned when job ends
$HOME Personal files, config, dot files Yes Yes ~2 GB Shared between Linux and Windows
/project Code, thesis, conditions, unique plots Yes Yes Group quota Expensive — keep it tidy
/data Software environments, containers, log files, modifiable results No Yes Several TB/user POSIX compliant; NFS mounted
/dcache Large data files > 1 GB: ntuples, ROOT files, MC samples No No (write-once) Petabytes (group) Also accessible via xrootd / WebDAV

Warning

/data, /dcache, and $TMPDIR are not backed up. If data is lost, it cannot be recovered. Store anything precious or irreplaceable in /project or $HOME.


Storage locations

$TMPDIR — Node scratch space

Use for: Temporary and intermediate results during a running job — for example, intermediate MC output that will be merged into larger files before being written to /dcache or /data.

What goes here:

  • Intermediate files and logs produced during a job
  • MC output files before merging
  • Any data that is only needed within the lifetime of a single job

What does NOT go here:

  • Final analysis results or ntuples (put those in /dcache or /data)
  • Private data (put that in $HOME)
  • Code and scripts (put those in /project or $HOME)

Warning

$TMPDIR is automatically cleaned up when the job ends. Do not store anything here that you need after job completion.

Always use $TMPDIR (not /tmp/) to find your scratch directory, as the actual path may vary by node. For scripts that also run on your laptop, use ${TMPDIR:-/tmp} for portability.


$HOME — Home directory

Use for: Personal files, configuration settings, and data used only by yourself.

What goes here:

  • "Dot" files and shell configuration (.bashrc, .profile, etc.)
  • Personal analysis results and draft versions of documents
  • Simple personal scripts
  • Private files (emails, personal data)

What does NOT go here:

  • Ntuples or large data files (use /dcache or /data)
  • Scripts and frameworks shared with colleagues (use /project)
  • Conda environments or package caches (use /data — see note below)
  • Intermediate files (use $TMPDIR)

Backed up with 3 replicas (disk, local backup, and external TSM). Quota is typically 2 GB for new users. Do not use it for high-throughput I/O.

Tip

Your home directory is not readable by other users. However, your public_html/ directory is publicly accessible from the web unless you add .htaccess controls.


/project — Project storage

Use for: Unique, precious data that must outlive your personal stay at Nikhef.

What goes here:

  • Unique software and analysis frameworks used by a group
  • Conditions data, calibration data, and settings files
  • Final thesis chapters and supporting materials
  • Precious plots, histograms, and tabular data that feed into publications
  • Jupyter notebooks and scripts used to produce those plots

What does NOT go here:

  • Personal private files (use $HOME)
  • Analysis results that can be reproduced (use /dcache or /data)
  • Large files that can be replicated to the Grid
  • Intermediate results (use $TMPDIR)

Warning

Storage is limited and shared across your group. If you fill the group quota, your colleagues will be affected. Keep /project tidy and remove stale files regularly.

Tip

Remember to also deposit notebooks, tabular data, and results in a FAIR data repository such as HEPdata, CERN Open Data, or Zenodo.


/data — Data directory

Use for: Modestly-sized working data that could in principle be reproduced or re-downloaded.

What goes here:

  • Software environments (Conda environments, virtual environments)
  • Apptainer / Singularity container images (.sif files)
  • Private ntuples and results where files need to be rewritten or modified
  • Log files you need to collect and review later
  • Analysis results for ongoing work

What does NOT go here:

  • Your software and scripts (use /project or $HOME)
  • Intermediate log files produced during jobs (use $TMPDIR)

Tip

/data/tunnel is specifically designated for sharing data between the Nikhef local environment and the public Grid environment.

Warning

/data is not backed up. If the system fails catastrophically, data cannot be recovered. The NFS server has limited transaction throughput — heavy use will impact both desktop and Stoomboot users.


/dcache — dCache storage

Use for: Large-scale data files that are written once and read many times.

What goes here:

  • ROOT files and ntuples
  • MC samples and simulation output
  • Large analysis results (> 1 GB)
  • Any data that needs to be accessed from many Stoomboot nodes simultaneously

What does NOT go here:

  • Software and scripts (use /project or $HOME)
  • Files you need to actively edit (files cannot be modified once written — see below)
  • Intermediate log files (use $TMPDIR)

Files in /dcache are immutable

Once written to dCache, a file cannot be overwritten or appended to. Attempting to do so will result in Permission denied. dCache is designed for files that are written once and read many times.

Accessing dCache remotely: dCache can also be accessed from outside Stoomboot using the xrootd, WebDAV, and GridFTP protocols (certificate required). Link your certificate to your Nikhef account via sso.nikhef.nl and select "Connect your certificate". The xrootd door is at dcache.nikhef.nl.

Tip

Monitor your group's dCache usage via the dCache usage dashboard (requires Nikhef network or eduVPN).


Using storage with batch jobs

When submitting jobs to Stoomboot via HTCondor, follow these conventions:

  • Read input data from /dcache or /data — both are accessible from all Stoomboot nodes via NFS.
  • Write intermediate output to $TMPDIR — fastest option; avoids NFS load during the job.
  • Write final output to /dcache (files > 1 GB) or /data (smaller modifiable files).
  • Avoid writing directly to /project or $HOME or /data from batch jobs — the NFS servers hosting these are shared with critical Nikhef desktop services and are not designed for high-throughput batch I/O.

Typical batch job storage pattern:

Input:        /dcache/[group]/[your_data]      ← read ntuples / MC samples
Working:      $TMPDIR                           ← intermediate files during job
Final output: /dcache/[group]/[your_output]     ← large results (> 1 GB)
              /data/[group]/[username]/         ← smaller modifiable results

File recovery

If you accidentally delete or overwrite a file in $HOME or /project, recovery may be possible as these filesystems are backed up. Contact the CT helpdesk at helpdesk@nikhef.nl as soon as possible.

Recovery is not available for /data, /dcache, or $TMPDIR.

Read more about file recovery.


Getting help