8 Admin Tasks
In the following, it is assumed that $CHROOT
resolves to /opt/ohpc/admin/images/<version>
.
8.1 Warewulf
Cluster management.
8.1.2 renv
cache
The renv
cache is mapped centrally to /opt/R/renv
in RSW.
To share the RSW cache and edi
cache with the nodes, an NFS share has been added.
See the previous section for more details.
8.1.4 Enable systemd service in image
export CHROOT=<some path>
chroot $CHROOT systemctl enable <service>
8.1.5 Updating image nodes
/root/update-nodes.sh
Sometimes munge
does not start after updating the nodes, causing the nodes to be out of sync with the controller. Check systemctl status munge
and eventually restart munge on all nodes:
pdsh -w c[0-5] mkdir /var/log/munge
pdsh -w c[0-5] chown -R munge:munge /var/log/munge
pdsh -w c[0-5] systemctl restart munge
scontrol update nodename=c[0-5] state=resume
In addition, permissions on /opt/R/renv
should be public r+w which is sometimes also not true and causes problems in combination with renv
.
pdsh -w c[0-5] chmod -R 777 /opt/R/renv
8.2 SLURM
Some notes:
-
/etc/slurm/slurm.conf
must always be identical everywhere (RSW, edi, nodes) - In
/etc/slurm/slurm.conf
twoSlurmctldHost
entries are needed (one for edi, one for RSW in the container)
8.3 Docker
8.3.1 Pulling a new image
Via user admingeogr
which has AWS pull credentials configured
cd /home/admingeogr/rsw
# log into AWS ECR repo
aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 222488041355.dkr.ecr.eu-central-1.amazonaws.com
docker-compose pull
8.3.3 Clean up old images
docker image prune -af
Shotts, William E. 2012. The Linux Command Line: A Complete Introduction. San Francisco: No Starch Press.
Sobell, Mark G. 2010. A Practical Guide to Linux Commands, Editors, and Shell Programming. 2nd ed. Upper Saddle River, NJ: Prentice Hall.
Ward, Brian. 2015. How Linux Works: What Every Superuser Should Know. 2nd edition. San Francisco: No Starch Press.