SLURM
Yoo et al. (2003) from Lawrence Livermore introduce SLURM (slurm) [1]. Slurm is a cluster manager built in response to the lack of scalability (e.g. Portable Batch System, Beowulf Distributed Process Space), lack of portability (e.g. Livermore’s Distributed Production Control System, Condor), and excess of proprietary (e.g. Quadrics, IBM LoadLeveler, Platform’s Load Sharing Facility) in existing cluster resource managers. Since its release around fifteen years ago, Slurm seems to have set a firm foundation and appears to be roughly interchangeable with Maui→Moab/Torque for systems at Maryland [3], Case Western [4], and Lawrence Livermore itself [5].
Before diving into the guts of Slurm, I want to write a note to myself that I don’t understand exactly how Slurm relates to related cluster-management technologies that have been released since this paper, specifically Hadoop’s YARN, Apache Mesos, and Google’s Kubernetes. For now, my assumption is that YARN handles the software clients trying to access the cluster [6], that Mesos is tailored for Spark and works quite differently than Slurm [7], and that Kubernetes orchestrates dynamic Docker containers which as far as I know are several steps removed from the type of hard non-cloud cluster that Slurm is intended for.
In addition to scalability, portability, and open-sourceability, the authors designed Slurm to also be simple, easy-to-admin, robust, secure, and interconnect-independent. Simple means that the software is straight-forward and approachable to community developers, and easy-to-admin means that slurm is straight-forward and approachable to cluster administrators. I haven’t looked through the source code, but it has 145 contributors which sounds to me like the project has been accessible, though I imagine it’s difficult to maintain simplicity as the open-source community gets more and more interested in the project. We are trying slurm out in our assignment, and the centerpiece of Slurm’s easy-to-admin characteristic is the simple CLI (Fig 1), the handful of configuration files, and the minimization of distributed state. I’m not sure, but I think part of what reduces distributed state confusion is the fact that there is a master node that keeps track of everything that is going on. Slurm is robust for several of the same reasons that the later Hadoop and Spark projects are robust – an optionally replicated master node that possesses all the metadata for compute nodes and jobs status and compute nodes that constantly send status updates to the master (Fig 1).
Regarding security, I wonder how much HPC clusters worry. At the most basic level, users only have permissions to delete their own jobs, a feature that undoubtedly prevents all kinds of workplace drama, and only the root or slurmuser has the ability to admin the cluster. Beyond these simple intranet concerns, I don’t quite understand why slurm should have its own authentication or encryption protocols, likely due to my inexperience with cybersecurity. I would think that the institution’s network would do all of the heavy lifting in securing access to the cluster, so that remote can be accomplished only by ssh. I look forward to learning more about security in the coming semesters. Finally, cluster interconnect is super convenient, because it supports certain proprietary protocols but happily accepts UDP/IP.
I can’t say I totally understand all of the slurmd subsystems, but I decided to try to dive into the idea that the daemon is multithreaded. While looking into related work a few paragraphs ago, I took a brief look at a paper by Acun et al. 2016 that developed a competing cluster manager called Power-Aware Resource Manager (PARM), written in C++’s parallel framework Charm++, that outperformed Slurm due to its ability to dynamically allocate nodes [2]. This dynamic design reminds me of cloud cluster systems like those operated by Kubernetes. Anyway, the reason I bring it up is that C++ apparently has a domain that is particularly suited for parallel computation, and I’ve been struggling for a few weeks to really understand what multithreading is. I briefly scanned the slurmd.c source code and found that threads were automagically created and broken down by pthread. Who but our very own LLNL would have a fantastic tutorial on threading [8] that includes a comparison to forking, an understanding of which has been on my wishlist, as well as an assertion that there is no magic involved, womp womp. So, pthread.mutex is used a lot in slurmd.c, and is a general mechanism for protecting information in a thread with a lock. This makes me wonder what happens when a thread fails to finish a process and makes me think of the fractal that is cluster node management > multi-threaded node process management. And on that note, time to move on…
I think I’ll get practice with the command line utilities, so it’s helpful to have this reference. I don’t imagine I will think too much about sockets or STORM with my small VM cluster, and I definitely won’t be worried about security. It’s helpful to see the sequence diagrams (e.g. Fig 2), because I have a nice picture of how the user interacts with the master via the command line and how the master interacts with the compute nodes presumably through UDP/IP. What I don’t see are the status updates that slurmd should be sending to slurmctld before and during the job to let the master know it is available and working, but I guess it’s not reasonable to overcrowd these diagrams. On the other hand, I believe that the messages I’m looking for from slurmd to slurmctld come into play much more during the batch and allocated runs than they do during this interactive run, so I think I probably have a very shallow understanding of when these three different types of runs are used. The authors do define these three different types of runs, but I have a feeling I’d better understand when each is useful with a bit more experience using a cluster for various jobs.
References [1] M. Jette and M. Grondona, “SLURM: Simple Linux Utility for Resource Management,” Clust. Conf. Expo CWCE, vol. 2682, pp. 44–60, 2003. [2] B. Acun et al., “Power, Reliability, and Performance: One System to Rule Them All,” Computer (Long. Beach. Calif)., vol. 49, no. 10, pp. 30–37, 2016.
[3] https://www.glue.umd.edu/hpcc/help/slurm-vs-moab.html
[4] https://sites.google.com/a/case.edu/hpc-upgraded-cluster/slurm-cluster-commands
[5] https://computing.llnl.gov/tutorials/moab/
[6] https://waltermblair.github.io/blog/2017/07/12/Spark
[7] http://blog.typeobject.com/a-quick-comparison-of-mesos-and-yarn
[8] https://computing.llnl.gov/tutorials/pthreads/