8 Admin Tasks
In the following, it is assumed that $CHROOT resolves to /opt/ohpc/admin/images/<version>.
8.1 Warewulf
Cluster management.
8.1.2 renv cache
The renv cache is mapped centrally to /opt/R/renv in RSW.
To share the RSW cache and edi cache with the nodes, an NFS share has been added.
See the previous section for more details.
8.1.4 Enable systemd service in image
export CHROOT=<some path>
chroot $CHROOT systemctl enable <service>
8.1.5 Updating image nodes
/root/update-nodes.sh
Sometimes munge does not start after updating the nodes, causing the nodes to be out of sync with the controller. Check systemctl status munge and eventually restart munge on all nodes:
pdsh -w c[0-5] mkdir /var/log/munge
pdsh -w c[0-5] chown -R munge:munge /var/log/munge
pdsh -w c[0-5] systemctl restart munge
scontrol update nodename=c[0-5] state=resume
In addition, permissions on /opt/R/renv should be public r+w which is sometimes also not true and causes problems in combination with renv.
pdsh -w c[0-5] chmod -R 777 /opt/R/renv
8.2 SLURM
Some notes:
-
/etc/slurm/slurm.confmust always be identical everywhere (RSW, edi, nodes) - In
/etc/slurm/slurm.conftwoSlurmctldHostentries are needed (one for edi, one for RSW in the container)
8.3 Docker
8.3.1 Pulling a new image
Via user admingeogr which has AWS pull credentials configured
cd /home/admingeogr/rsw
# log into AWS ECR repo
aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 222488041355.dkr.ecr.eu-central-1.amazonaws.com
docker-compose pull
8.3.3 Clean up old images
docker image prune -af
Shotts, William E. 2012. The Linux Command Line: A Complete Introduction. San Francisco: No Starch Press.
Sobell, Mark G. 2010. A Practical Guide to Linux Commands, Editors, and Shell Programming. 2nd ed. Upper Saddle River, NJ: Prentice Hall.
Ward, Brian. 2015. How Linux Works: What Every Superuser Should Know. 2nd edition. San Francisco: No Starch Press.