How to use the cluster and its filesystems "nicely"¶

The ScienceCluster is shared between many researchers. Thus everyone should be careful to avoid usage that can negatively impact other users.

In particular, the ScienceCluster filesystem can experience signficant latency when under stressful workloads, particularly when doing input/output (i/o) on large numbers of small files. Here are some tips on how best to avoid causing issues for other ScienceCluster users:

Things to avoid:¶

Having more than ~1000 files in a directory This causes even simple operations such as ls to put a heavy strain on filesystem.
Having more than 7 levels of subdirectories. A very deep or wide directory tree leads to high filesystem loads.
Running many short jobs with heavy i/o at startup. Runtime of short jobs will be dominated by time spent on i/o rather than computing. Simultaneous heavy i/o jobs stresses filesystem. Consider combining very short jobs into longer jobs.
operations that query all files in a large directory tree (e.g. du, find). Instead, on our cluster, to get a directory size: ls -lh; and to get the number of files contained: getfattr -n ceph.dir.rfiles <PATH>.
Multiple simultaneous filesystem operations, e.g. creating virtual environment; cleaning caches; deleting large directories. With many users on the system, this can lead to high latency for everyone.
Simultaneous file transfers (e.g. rsync/scp) tasks. File transfers are a significant load on the filesystem and the network. If you need to transfer a very large amount of data, Globus is the recommended tool.
Compute heavy tasks on a login node. Always run workloads within a slurm job, either with sbatch (more info) or by starting an interactive slurm job.
Out of control automation. If you automate parts of a workflow, check and test very carefully before putting it into production. You can make use of scrontab for recurring slurm jobs.
- Using watch or other method to frequently monitor or scan a directory.
- Using watch or other method to frequently run squeue. Running squeue every few seconds or even every few minutes can cause a heavy load on the slurm controller.
If using containers, do not unpack/sandbox .sif image files. You can now run Apptainer directly from the .sif image. See our Apptainer Tutorial.
Parallel i/o tasks inside of parallel jobs. Our cluster filesystem is not intended for extremely high performance file reading or writing using multiple parallel threads. Single threaded i/o is preferred.
VS Code ssh to the login nodes. VS Code is only supported through ScienceApps, either the Code Server app, or the within the Remote Desktop Environments (MATE and Xfce).
Using git with many files, especially virtual environments or data folders, and even if ignored with .gitignore. Consider setting git config feature.manyFiles true or git config core.untrackedCache true.

Things to do:¶

Be aware of how your application/code works, especially for i/o and/or with respect to parallelism. A working understanding of what you are running helps avoid potential issues and helps for requesting slurm resources efficiently.
Be mindful of your filesystem usage and quota, both total data size and number of files. If you go over quota, your filesystem writes will be deactivated until you reduce storage.
Clean up caches for applications that use heavy caching, such as Conda or Apptainer. This can improve performance of those applications, especially while setting up new environments.
Clean up unneeded files or data regularly. Be careful not to delete files that you need because the cluster filesystems are not backed up. Deleted files are not recoverable.
If running job arrays, limit the number of simultaneous running jobs. This helps with fairly sharing the queue, and helps avoid issues with simultaneous i/o. Syntax in a slurm script is e.g., #SBATCH --array=1-16%4 for an array of 16 jobs with a limit of 4 simultaneous running.
Monitor your jobs for potential problems.

Things to consider:¶

Consider combining directories with large numbers of small files into a single file.
- Use the command tar to combine a directory tree of many small files into a single file, when files do not need to be access frequently (note that extracting the tarfile is a heavy load)
- (Advanced!) Consider "squashing" directories with many small files using squashfs (may first require installation: e.g. conda-forge). This allows a directory and subdirectories to live on the filesystem as a single image file without needing to unpack the files.
Consider uv instead of conda/mamba for python virtual environments. uv tends to be faster and lighter.
Consider using containers instead of Python or R virtual environments. This will take more time to set up, but it can result in better performance at runtime due to everything being stored in a single image file.
To speed up rsync transfers, you can disable checksums (although this carries some risk that of a corrupted transfer going unnoticed).