SP2 Usage Notes

The SP2 system consists of a control workstation and four IBM RS/6000-590 nodes (numbered 1, 3, 5, and 7). Each of these is an independent computer running its own copy of the AIX 4.2.1 operating system. The nodes are connected via a private ethernet and a High Performance Switch with a bandwidth of 35 megabytes per second. Parallel programming on the SP2 is done using message-passing algorithms, e.g., with the MPL subroutine library.

The main SP2 node is called “sp2.dal.ca”. This is the name to which you should login via telnet (except before the first time you use the SP2; see “Password changes” below). Each node has an “internal” name of the form

sp2-eN.ucis.dal.ca

where “N” is the SP2 node number, 1, 3, 5, or 7. While it is possible to do so, usually there is no need to login to a particular node.

The “vi”, “jove”, and “pico” text editors are available.

The command to run the Fortran 90 compiler is “xlf”. Fortran options can be looked up with the command “man xlf”. It is recommended that you use at least “-O” optimization because this can achieve a substantial improvement in execution time.

The command to run the High Performance Fortran compiler, with which it is possible to automatically parallelize a program, is “xlhpf”. Command options can be looked up with “man xlhpf”.

Password changes

All the SP2 nodes share the same password file. The master copy of the password file is stored on the control workstation and periodically is copied to the nodes. Consequently, to change your password, and, especially, before you use the SP2 for the first time, you must login to the SP2 control workstation “sp2c.ucis.dal.ca” and use the “passwd” command there.

Note: If you change your password on “sp2.dal.ca” or one of the other nodes the change WILL NOT LAST because it will be overwritten by the password from the control workstation.

After you have changed your password on the control workstation it will take until about 10 past the hour for the change to get to “sp2.dal.ca” and the other nodes.

Mail Forwarding

The first time you login you should probably set up an e-mail forwarding to the place where you regularly read e-mail (create a file “.forward” in your home directory containing the address to which e-mail sent to you on the SP2 should be redirected; test this by sending mail to “username@sp2.dal.ca”, where “username” is your SP2 username).

Disk space

User permanent disk space is allocated in the /home file system. At this time there are no disk space quotas, so please be conservative in your disk space usage. /home is locally mounted on SP2 node 1 and NFS mounted on the other nodes so that files in /home can be accessed from any node. Please do not use the /tmp file system for temporary files: it is not very large and in the future may be even smaller. Each node has its own two gigabyte /usertmp file system for temporary files.

Temporary files that need to be shared across nodes can be put in the two gigabyte /globaltmp file system which is locally mounted on node 1 and NFS mounted on the other nodes.

There is currently no regular removal of files from /tmp or /usertmp (files stored in these file systems are kept across system reboots), so please clean up files you don’t need any more.

Shell

The standard system shell is the Korn Shell, ksh. Please don’t change your login shell, as this may cause difficulties with system login procedures. If you prefer to use a different shell, “exec” it in your .profile file. Note: The current directory is not in the PATH environment variable. Thus to execute a program from your current directory you must type “./program_file”, not “program_file”.

LoadLeveler

Please run anything which needs a non-trivial amount of CPU time via LoadLeveler. This is the only way in which machine resources can be used efficiently and fairly. If you run a program interactively, e.g.,

$ ./program

or
$ ./program &

it runs on the node to which you have logged in (i.e., node 1). If several people all do this they all run on the same node and compete with each other for CPU time.

LoadLeveler automatically runs your program on the least loaded SP2 node. To use LoadLeveler make a LoadLeveler command file (e.g., “job.cmd”) that looks like

`# @ class = short
# @ error = name_of_file_to_hold_error_output
# @ output = name_of_file_to_hold_regular_output
# @ queue
./program`

and then use the “llsubmit” command:

$ llsubmit job.cmd

The “llstatus” and “llq” commands can be used to track running of your job. A mail message is sent to you when a job completes (this can be controlled with the “notification” and “notify_user” LoadLeveler statements).

Three LoadLeveler job classes have been defined:

  • short - Maximum 15 minutes CPU time.
  • medium - Maximum one hour CPU time.
  • long - No CPU time limit

All the job classes have the same base “queue priority”, so scheduling of jobs to run is first come, first served. You may submit any number of jobs, but only two at a time may be running or under consideration to be run. This prevents a user submitting numerous jobs to “reserve” blocks of time by getting in ahead of everyone else.

Each SP2 node is allowed to run just one LoadLeveler job at a time, so the job has exclusive use of that node.

There is a minimally site-specific, tutorial on LoadLeveler usage from the Maui High Performance Computing Center.

Reporting problems

Send mail to cfo-systems@dal.ca.