Frequently Asked Questions
How can I login or send a command to a specific node?
Use the hostname resource specification, i.e.
qrsh -l hostname=node7
The "-l" option is the resouce specification flag, see the qsub/qrsh/qlogin man pages or the Sun Grid Engine User Guide.
Why can't I use ssh/slogin/scp to login from the SIRAF master node to a computation node?
The SIRAF cluster's funding agencies require us to keep strict accounting of resources to demonstrate that the facility is being used appropriately. Unrestricted logins would allow users to bypass the Sun Grid Engine accounting mechanisms. Since using the SGE qrsh/qlogin commands with the hostname resource accomplishes the same purpose as ssh/slogin, there is no loss in functionality.
How can I submit MPI or PVM parallel jobs if direct rsh/ssh connections aren't allowed?
Sun Grid Engine provides parallel environments (PEs) for submitting jobs which require internode communication. Currently PEs have been created for MPICH, MPICH2, LAM, OpenMPI, PVM, and single node shared memory/multithreaded applications. They are named mpich, mpich2, lam, openmpi, pvm, and shm respectively. The user specifies the number of CPUs/slots and the parallel environment selects the nodes to dispatch the job to by available CPU, memory, software licenses, and other resources. The user does not need to specify a hard-coded list of nodes. In addition the user must submit an MPI/PVM job to the distmem.q queue. For more information, see the SGE User Guide and the sge_pe man page.
Here is a simple OpenMPI job script to print out the hostnames of nodes selected by the openmpi parallel environment:
openmpi_job.sh sample script
#!/bin/sh
# Make sure OpenMPI is the first MPI implementation in your PATH
export PATH=/usr/lib64/openmpi/current/bin:$PATH
# This should output the number of slots requested via qsub -pe
echo "Got $NSLOTS slots."
mpiexec -n $NSLOTS /bin/hostname
exit 0
and submit the above job script requesting 8 slots with:
qsub -q distmem.q -pe openmpi 8 openmpi_job.sh
I can't run Matlab/IDL/Mathematica on the login node.
This is a feature, not a bug. Running a computationally intensive job on the login node will cause it to become unresponsive and deny services to the other users. Please submit your computational jobs to one of the compute nodes, e.g.
qrsh matlab