Compute Server Etiquette
The Compute Servers are provided as a service to all Thayer and Computer Science users for their high-performance computing and class needs. Currently, there is no scheduling system for these compute servers - all scheduling is ad hoc. This means that you can simply log into a machine and run a job. However, if you have a large-scale computation to do, please keep in mind the potential needs of others. With that in mind, here are a few rules to follow regarding use of our Compute Servers:
Try to run jobs on an under-utilized machine
Take advantage of the Cluster Usage Checking Tool (VPN required) to see which node(s) are under-utilized and use those nodes
Don't use all CPUs on a node
Each babylon node has either 64, 40 or 24 logical CPUs. If you are running a serial (one-CPU) job and need to, for example, run eight instances, run four each on two nodes or two each on four nodes to spread out your usage. For multi-threaded jobs, be sure you're not using all CPUs on a node.
Set "nice" on long-running jobs or jobs requiring all CPUs
If you have a job that will run for several days, or one that will need to use all CPUs on one or more nodes, set its "nice" value higher than zero. This will allow other jobs to temporarily use CPUs to get work done. Your jobs will use CPU time whenever higher-priority jobs are not using them. To run a program with a higher nice value (e.g. 10), use the nice command:
$ nice -n10 ./myprogram
If a machine already has "niced" processes, please run only short jobs
You can see what the nice level of processes is by looking at the NI column when running the 'top' command. If you need to run a job on which you would set a nice value, please choose another node.
Don't use all RAM on a node
Babylon1 through babylon4 have 512GiB of RAM, babylon4 through babylon8 have 256GiB of RAM and babylon9 through babylon12 have 128GiB of RAM. Know how much RAM your process will use and make sure that it won't use all available RAM. The amount of free RAM can be checked with the Cluster Usage Checking Tool (VPN required) or by using the 'free' command.
If you are not sure how much RAM your process will consume or there is the potential that it could consume more memory than you expect (e.g. during debugging), limit your process with our memlimit tool. This tool sets a hard limit on the amount of memory that can be allocated by a program or a particular session. To run a program with a 1GiB limit, you use a command like:
memlimit -m 1 mycommand
The total amount of RAM that cam then be consumed by "mycommand" (or any of its child processes) is then 1GiB, and the process will be killed if it tries to allocate more than this.
To open a new shell with a limit, simply leave off the command:
memlimit -m 1
Once you enter this, no process in this shell can allocate more than 1GiB of memory. The only caveat is that once you set this, it can't be reset in the same shell - you'll have to close this shell and open a new one to set a different limit.
Thank you for your cooperation. If everyone is observant and mindful of each-other's usage, everyone can get the maximum value out of this resource.
If you have any questions about these rules, or about your or someone else's usage of our Compute Servers, please contact us at computing@thayer.dartmouth.edu