1 Introduction
Welcome to the user manual of the High-Performance-Cluster (HPC) of the GIScience group (University of Jena).
It is tailored towards R processing but can also be used with any other programming language.
A short introduction to HPCs
The big advantage of a HPC is that users can submit jobs to ONE machine which then distributes the work across multiple machines in the background. Incoming processing requests (jobs) are handled by the scheduler (SLURM), taking away the work of queuing the job and the potential issue of clashing into jobs from other users.
Administration is simplified by provisioning all computing nodes with the same virtual image. This way, maintenance tasks are reduced and differences between the machines are avoided. Library management across nodes is is done via the Spack package manager as this application allows for version-agnostic environment module installations.
The $HOME
directory of every user is shared across all nodes, avoiding the need to keep data and scripts in sync across multiple machines.
Before you start:
Working on a Linux server naturally requires a certain amount of familiarity with UNIX command-line shells and text editors. There are dozens of Linux online tutorials which should help to get you started.1 Of course, there are also great books on how to use Linux such as Shotts (2012), Sobell (2010) and Ward (2015) all of which are freely available. If you still get stuck, Google will certainly help you.
Please add a SSH key pair to your account to be able to log in to the server without having to type your password every time. This is especially useful since your password will consist of many letters and numbers (> 10) which you do not want to memorize. See this guide if you have never worked with SSH keys before. If you already use a SSH key pair on your machine, you can use
ssh-copy-id <username>@10.232.16.28
to copy your key to the server. Afterwards you should be able to login viassh <username>@10.232.16.28
without being prompted for your password.
1.2 Hardware
The cluster consists of the following machines:
Group “threadripper”:
- CPU: AMD Threadripper 2950X, 16-core, Hyperthreading support, 3.5 GHz - 4.4 GHz
- RAM: 126 GB DDR4
- Number of nodes: 4 c[0-2] (+ frontend)
- The “frontend” is only operating on 12 cores with 100 GB RAM
Group “opteron”:
- CPU: AMD Opteron 6172, 48 cores, no Hyperthreading, 2.1 GHz
- RAM: 252 GB DDR3 (c5 only comes with 130 GB RAM)
- Number of nodes:
- 2 (c[3-4])
- 1 (c5)
The groups are reflected in the scheduler via the “partition” setting.
Group “threadripper” is about 3.5x faster than group “opteron”.
1.3 Software
The HPC was built following the installation guide provided by the Open HPC community (using the “Warewulf + Slurm” edition) and operates on a CentOS 7 base. The scheduler that is used for queuing processing job requests is SLURM. Load monitoring is performed via Ganglia. A live view is accessible here. Spack is used as the package manager. More detailed instructions on the scheduler and the package manager can be found in their respective chapters.
1.4 Data storage
The mars data server is mounted at /home
and stores all the data.
Currently we have a capacity of 20 TB for all users combined.
Data can be stored directly under your /home
directory.
1.5 Accessing files from your local computer
It is recommended to mount the server via sshfs
to your local machine.
Transfer speed ranges between 50 - 100 Mbit/s when you’re in the office so you should be able to access files without a delay.
Accessing files from outside will be slower.
If you really run in trouble with transfer speed, you could directly connect to the mars server.
Otherwise, the route is as follows: <local> (sshfs) -> edi (nfs) -> mars
1.5.1 Unix
For Unix system, the following command can be used
sudo sshfs -o reconnect,idmap=user,transform_symlinks,identityFile=~/.ssh/id_rsa,allow_other,cache=yes,kernel_cache,compression=no,default_permissions,uid=1000,gid=100,umask=0 <username>@10.232.16.28:/ <local-directory>
The mount process is passwordless if you do it via SSH (i.e. via your ~/.ssh/id_rsa
key).
Note that the mount is actually performed by the root user, so you need to copy your SSH key to the root user: cp ~/.ssh/id_rsa /root/.ssh/id_rsa
.
For convenience you can create an executable script that performs this action every time you need it.
Auto-mount during boot via fstab
is not recommended since sometimes the network is not yet up when the mount is executed. This applies especially if you are not in the office but accessing the server from outside.
1.5.2 Windows
Please install sshfs-win and follow the instructions.
1.6 Accessing the HPC remotely (VPN)
A VPN connection is required to access the HPC remotely. FSU uses the Cisco Anyconnect protocol for VPN purposes and recommends to use the “Cisco Anyconnect Client”.
This is a GUI which requires manual connect/disconnect actions. In addition, the Cisco Anyconnect client is known to be slow and laggy (sorry no references here, just personal experience). The open-source alternative openconnect does a much better job - and comes with a powerful CLI implementation. If a GUI is preferred, have a look here (looks like in Windows there is no way around the GUI?).
Installation
- macOS:
brew install openconnect
- Windows:
choco install openconnect-gui
or downloading the GUI directly
Usage
On a UNIX-based system:
Start: echo <PASSWORD> | sudo openconnect --user=<USER>@uni-jena.de --passwd-on-stdin --background vpn.uni-jena.de
Stop: sudo killall -SIGINT openconnect
On Windows: Sorry, not using Windows - feel free to add this info here.