Logging In

After obtaining an account, you can use a Secure Shell (SSH) client or the NoMachine nxclient remote desktop application to access the SIRAF interactive login (or "head") node:

siraf-login.bsd.uchicago.edu

For logins originating from within the University of Chicago's .uchicago.edu domain (IP addresses starting with 128.135. or 10.135.) configure your SSH client to connect to the standard SSH port 22 or port 22210. For off-campus logins, please connect to port 22210. Linux and most versions of Unix (including MacOS 10) ship with the OpenSSH client. There are SSH clients available for Windows systems, such PuTTY and the copSSH package, as well as the commercial SSH.com client if your site has a license. You can also use Secure Shell to transfer files to your account using scp or sftp.

Secure Shell is configured to allow X Window applications to run directly from the login node, but users who wish to run graphical applications should connect using the NoMachine NX protocol. NX provides a graphical desktop which can be suspended and then resumed from a different location, allowing users to log off at work and then login again at home over broadband, Wi-Fi, or dial-up, without having to restart the desktop environment and associated applications. The login node has an unlimited user license for NoMachine NX Server. The nxclient client software can be downloaded from the NoMachine web site and is available free of charge for Linux, Windows, MacOS, and Solaris.

If you logged in via Secure Shell, you are encouraged to avoid using your password in the future and rely instead on public key authentication, which is both more convenient and secure. If you logged in via nxclient, the NX protocol already uses public key authentication between the server and client, and requires the user password to complete the authentication process.

If you logged in via nxclient, start a terminal emulator window. Note the command line prompt does not show "siraf-login" but "master". The hostname command will show "master.siraf.uchicago.edu". The login node's external network connection to the Internet is named siraf-login.bsd.uchicago.edu, but its internal cluster network interface is named "master.siraf.uchicago.edu". The siraf.uchicago.edu domain is internal to the cluster and is not an official network sub-domain of the University.

Here is a diagram of the cluster network configuration:

http://siraf-login.bsd.uchicago.edu/graphics/cluster-schematic-rev2.jpg

Exploring the Login Node

Now type "df" to get a list of the filesystems on the login node. For now we are interested in the filesystem mounted under the /Projects directory. Enter ls /Projects at the command line. You will see directories corresponding to each of the project groups. Now enter ls /Projects/your-project. You will see three directories: users, data, and programs. The users directory contains your home directory along with those belonging to the other members of your project. The data directory contains shared research data and the programs directory contains applications specific to your project. Enter pwd. This should output either "/home/your-username" or "/Projects/your-project/users/your-username" depending on your shell configuration. Type "ls -l /home/your-username". There is a symbolic link in /home which points to the real location of your home directory. You can refer to files in your home directory using the /home/your-username path rather than the /Projects/your-project/users/your-username path, which can save a bit of typing.

The login node has a reasonably complete version of Scientific Linux 5 installed, including compilers and development libraries. Major computational packages such as Matlab, IDL, Mathematica, R, etc. can be run only on the computation nodes. The login node should be used to submit jobs to the cluster, to monitor their progress, and to compile your programs. (However, with a large compilation you are better off using a parallel make program such as qmake.) No long-running resource intensive jobs should be run on the login node since they would interfere with other user logins. Finally, don't use the login node for casual Web browsing, e-mail, document preparation, and other non-research functions.

Accessing the Cluster

Now it's time to access the rest of the cluster. Ideally a computer cluster would appear as a single computer. You would run a program and the cluster operating system would dispatch it to the cluster node with the appropriate resources. Such a cluster is called a Single System Image (SSI) cluster. The OpenMOSIX, OpenSSI, and Kerrighed projects are attempts to extend Linux to appear as an SSI OS across a cluster, but none of them are ready for production use. Instead, most computing clusters rely on aDistributed Resource Management System (DRMS).

A DRMS runs as an application on top of the operating system on each cluster node, monitoring resources such as CPU load, memory and disk use, software licenses and so on. It frees the user from having to make detailed decisions about which cluster node to run an application on. The user only has to give the DRMS a set of requirements (e.g. at least 8 free CPUs, 8GB of available memory, and a Matlab license) and the application to run, and the DRMS will find the cluster node which satisfies the requirements and then run the application on that node. If there are no nodes available with adequate resources the DRMS will hold the job in a queue until such resources become available. If an application has been compiled with checkpointing libraries, the DRMS can migrate the application between nodes if, for example, a node with faster CPUs becomes available, or resume a job from the last saved state in the event of node failure. From a funding perspective, the DRMS provides detailed accounting records of resource use by user and project for chargeback fees or to justify facility maintenance and expansion.

The DRMS used on SIRAF is Sun Grid Engine (SGE). SGE is available under an open source license and is free for use with paid support available from Sun Microsystems.

Getting started as an SGE user is relatively straightforward. There are four basic commands qsub and qrsh for starting jobs, and qstat and qmon for monitoring jobs. qsub submits SGE batch execution scripts which are shell scripts with embedded directives that allow you to fine-tune your execution and scheduling of jobs on the cluster. qrsh is essentially a cluster-aware version rsh/ssh; instead of specifying the hostname you let SGE determine the host best suited to handle your login or program execution request.

qsub is the most powerful of the commands and will end up being the command you use the most.

Enter the qlogin command. You will see some output before you get a new command prompt:

[user@master]~% qlogin

Your job 108 ("QLOGIN") has been submitted waiting for interactive job to be scheduled ...
Your interactive job 108 has been successfully scheduled.
Establishing /usr/share/sge/6.1/bin/lx26-amd64/qlogin_wrapper session to host node5.siraf.uchicago.edu ...
Last login: Sun Jan 20 18:56:06 2008 from 10.0.0.2

[user@node5]~%

SGE assigns a job ID number to every interactive or batch job, then schedules the job for execution. If the cluster has open execution slots and the specified resources the job is executed immediately, otherwise it waits in a queue until a slot and the resources are available. Currently there is a 1:1 correspondence between the number of slots and the number of CPU cores. Here the job has been assigned job ID #108, and was scheduled for execution immediately. SGE then calls the qlogin_wrapper program which performs various administrative and accounting functions and then calls a modified version of SSH to connect to the computation node selected by SGE. In this case it is node5 in the cluster. The end result is a login prompt which functions exactly as any other SSH login.

Use the df command to list the filesystems available on the computation node. You will see /Projects filesystem; the computation nodes have access to the user home and data directories identical to the login node. The /usr/local/packages and /usr/local/bin filesystems are remotely accessed Network Filesystem Shares which contain commercial and third party software applications, including IDL, Matlab, and Mathematica. There is also a filesystem named /scratch which is local to each computation node. You can use it rather than the much smaller /tmp or /var/tmp filesystems to store temporary files for your calculations. Now try running a few benchmarks with your favorite applications, and run qstat -f on the login node to monitor the cluster status. Exit the interactive job using the logout or exit commands.

In SGE jobs are submitted to queues which are associated with physical nodes in the cluster and properties such as scheduling priority, user and group access restrictions, support for parallel execution environments and so on. The default job queue on SIRAF is named all.q and contains all the computation nodes in the cluster as members and has no special restrictions or other properties. There are two other queues named small.q and big.q. They are identical to the all.q queue, except that small.q contains only nodes with 2 CPUs and big.q only nodes with 8 CPUs. (Since you can use SGE resource specifications to choose nodes with the desired properties, these queues exist as a convenience rather than a necessity.) Repeat the previous directions for using qlogin, except this time add the -q flag with a queue name: qlogin -q big.q. Your interactive job will be started on one of the nodes with 32 CPU cores; note the shell prompt and the hostname command contain "bignodeN", where N is some small integer. Enter the htop command (a version of the process monitor top with more features) and you will see a long list of load activity bars for the 32 CPU cores. Repeat the preceding using the small.q queue, and htop will show only 8 CPU cores chugging away. Try running a few application benchmarks on a node in the big.q queue and then in the small.q queue.

The qrsh command is similar to the qlogin command, but instead of starting an interactive login shell, it takes as an argument the command you wish to submit to the cluster. If there is no command argument qrsh behaves identically to the qlogin command and starts a remote login shell. Proper use of qrsh will allow you to better distribute your interactive jobs across the cluster, e.g. you can start an editor session on one node with qrsh "emacs", and a compile on another with qrsh "( cd $PWD; make all )". (Because of the way qrsh interacts with ssh, you may need to use the Ctl-Z shell job control key sequence to suspend a job, then use the shell bg background command to resume the job in the background.)

Submitting SGE Batch Jobs

qsub has a straightforward command-line syntax: qsub batchfile-name, with options similar to qrsh and qlogin. The batch file can be can be an executable binary but usually it is a text file written in sh, tcsh, perl, python or whatever your favorite scripting language may be. With a text script file you can embed job control directives to SGE allowing for fine control over conditional execution, input/output to files, and so on. Let's go through a simple but useful example for your first batch job submission which can be used as a template later.

For this example we'll use Matlab. Using a text editor or the Matlab workspace, create a file named test.m containing the following Matlab code:

The commands will generate a 5x5 matrix of random numbers and save it in a file called data_array. Now create a second file name sge-test.sh containing:

The first line starting with #! is the standard Unix way of telling the operating system that a text file is a script which will be interpreted by the /bin/sh command. The next two lines start with #$ and are SGE directions. In this case they are command-line flags to qsub. Omitting the two lines and enteringqsub -j y -cwd would have the same effect, but saving the command line arguments in a script can save a lot of typing. The -j flag with the "y" option tells SGE to merge the error output with the standard output, and the -cwd flag tells SGE to execute the job using the current working directory. Submit the script file with qsub sge-test.sh. Use the qstat command to monitor its progress. When the script finishes you will see some new files in your current directory: the data_array.mat data file generated by the test.m Matlab command file, the sge-test.out file which contains output from the Matlab application interpreter, and sge-test.sh.ojob-ID which contains any messages from SGE while running the job. (If the job completed successfully sge-test.sh.ojob-ID should be an empty file.) Now try submitting some of your own programs as batch jobs.

Further Readings

To learn more about how to use Sun Grid Engine, please download and read theSun Grid Engine User Guide. The User Guide seems longer than it is; some sections will not be of interest until an advance stage of usage, and some are simply not of interest to the user. In particular, feel free to skip the Accounting and Reporting section, the Database Schema section, but do read Sections 1-4.

For more on how SGE is configured on SIRAF, read the SIRAF SGE Notes.

For SGE integration with parallel programming, please read the SIRAF Parallel Programming Notes.

Return To Home Page

SIRAFGettingStarted (last edited 2009-06-22 20:25:20 by cchan)